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Abstract 

We propose a new framework for modeling stochastic local volatility, with poten- 
tial applications to modeling derivatives on interest rates, commodities, credit, equity, 
FX etc., as well as hybrid derivatives. Our model extends the linearity-generating 
unspanned volatility term structure model by Carr et al. (2011) by adding a local 
volatility layer to it. We outline efficient numerical schemes for pricing derivatives in 
this framework for a particular four-factor specification (two "curve" factors plus two 
"volatility" factors) . We show that the dynamics of such a system can be approximated 
by a Markov chain on a two-dimensional space (Zt,Yt), where coordinates Z± and Yj 
are given by direct (Kroneker) products of values of pairs of curve and volatility fac- 
tors, respectively. The resulting Markov chain dynamics on such partly "folded" state 
space enables fast pricing by the standard backward induction. Using a nonparamet- 
ric specification of the Markov chain generator, one can accurately match arbitrary 
sets of vanilla option quotes with different strikes and maturities. Furthermore, we 
consider an alternative formulation of the model in terms of an implied time change 
process. The latter is specified nonparametrically, again enabling accurate calibration 
to arbitrary sets of vanilla option quotes. 



* Opinions expressed in this paper are those of the authors, and do not necessarily reflect the view of 
JPMorgan Chase and Numerix. 



1 Introduction 



1.1 Motivation 

The present work is motivated by the desire to have a unified modeling methodology and 
shared implementation for derivatives pricing with a dynamic volatility smile for various asset 
classes, including interest rates (IR), commodities, equities, credit, foreign exchange (FX) 
etc., as well as for modeling hybrid derivatives such as equity-IR or equity-commodities 
hybrids. We present one possible approach, which extends a recently proposed class of 
stochastic volatility models. 



1.2 Related Previous Work 



Gabaixj (2007) proposed a new class of asset price models, the so-called linearity- generating 



processes (LGP). Such processes are defined by the condition that the current prices of basic 
instruments (stock, bonds, futures, swaps etc.) are linear in a driving Markov process X t . 
This stands in sharp contrast to popular affine models where, e.g., a zero-coupon bond price 
P(t,T) is an exponentially-affine function of a Markov driver X t . 

On the theoretical side, the LGP processes appear very attractive. Indeed, typical ways 
we model basic instruments are drastically different between, say, IR and equity models^ 
In the equity world, basic instruments (equities) are linear in stochastic factors (usually 
taken to be equity prices themselves for purposes of modeling derivatives), and volatility is 
stochastic (SV) and unspanned (USV, see below for the definition of this term). 

In the IR world, the mathematics are almost the same for the HJM-type models that 
model the entire yield curve. As the yield curve is in one-to-one correspondence with bond 
prices, it can be viewed as an "observable" basic instrument that again is linear in factors, 
and typically gives rise to USV. 

But such linearity of HJM-like models has a high price, namely that the number of state 
variables needed for the Markovian dynamics turns out to be too high for use in a lattice- 
based setting in most cases of practical interest. Therefore, even with Markovian specifica- 
tions, HJM-like models are typically employed within a Monte Carlo setup rather than on a 
lattice. On the other hand, an attempt to reduce the curve modeling to a short-rate mod- 
eling, as is done in affine models, leads to a nonlinear relation between bond prices and the 
factors, which produces undesirable side effects, such as a dependence of the instantaneous 
forward curve on the short-rate volatility]^] 

This problem is resolved in the LGP approach. By putting both equity and bonds on 
equal footing in terms of making them both linear functionals of the factors (and doing 



1 Both are taken to be examples of term-structure models vs. spot-models. Instead of IR and equity, we 
could, e.g. compare commodities and FX. 

2 This is the cost one has to pay for non-linearity. Clearly nothing similar ever occurs in spot models: 
today's stock price St is obviously independent of the current volatility or current value of the volatility 
factor Y t . Mathematically, this can be formulated as the statement that for spot stochastic volatility models 
(such as e.g. the Heston model), the pricing function f(St,Y t ) of basic instruments (stocks) is an identity 
f{S t ,Y t ) = S t , see also TablefTjin Sect. M 
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it in a different way from HJM), the LGP-based models play a role of "grand unification 
models," similar in a conceptual sense to "grand unification theories" (GUT) in physics. No 
proliferation of the number of Markov drivers occurs in LGP-type models as we move from 
one class of basic instruments (stocks) to other class (bonds). 

Also on the practical side, linearity has profound consequences for tractability of asset 
pricing modeling within the LGP framework. In particular, if a zero-coupon bond price is 
linear in X t , then so will be prices of a coupon bond or a swap. As a result, the swaption 
pricing, e.g. can be done in a semi-analytical form without additional approximations, such 
as those used by the Libor Market Models (LMM). It is also very helpful in calibration, as 
will be discussed in more detail below. 

To summarize, the class of LGP-like models identified by Gabaix is a new interesting 
class that may develop into a viable competitor to both afline models, which are currently 
one of the main workhorses for derivatives modeling in credit, commodities, rates and other 
asset classes, and also HJM-type models. Yet this approach is in its infancy compared to 
the well-studied class of affme models. 

In 2011, Carr, Gabaix and Wu (CGW) proposed a LGP-type stochastic volatility term 
structure model (Carr et al. (2011 )). CGW, in particular, emphasize the point that stochastic 
volatility generated in LGP-type models is unspanned in the sense of the definition o fJColin 



Dufresne & Goldstein (2002), who coined the original term "unspanned volatility"Q The 



CGW model offers a number of attractive features. Most importantly, it is a low-dimension 
Markov model with unspanned stochastic volatility (USV), and an orthogonal set of model 
parameters with a separate calibration to the term structure and option volatilities. 

The CGW model is a pure stochastic volatility model, as volatility is modeled as a 
superposition of CIR processes. To make it more practical, it would be very useful to add a 
local volatility layer to the model. Our extension of the CGW model amounts to introduction 
of such a local volatility factor, along with efficient numerical methods for calibration and 
pricing. To differentiate our framework from CGW, in what follows we will refer to it as the 
unspanned stochastic local volatility (USLV) model. 



2 Overview of Our Framework 

By construction, USLV preserves the linearity and USV properties of the CGW approach. 
Another property inherited from the CGW model is that USLV is formulated directly in the 
physical measure P (see below) rather than in the risk-neutral measure Q, which makes it 
easier, e.g., to combine the historical and pricing data for model estimation, if desired. 

The main theoretical construction that USLV adds to the CGW model is a local volatil- 
ity layer. The resulting mixed stochastic/local volatility dynamics has a few important 
implications. 

First, adding a local volatility layer enables nearly perfect matching of an arbitrary 



Following Colin-Dufresne & Goldstein ( 2002| ), the volatility is called unspanned if bond prices do not 



depend on the stochastic volatility driver. 
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number of European vanilla option quotes with different strikes and maturities^] Such an 
extension is clearly desirable in order to apply this approach for pricing of both vanilla and 
exotic derivatives, especially if vanilla options are used to hedge the exotics. 

Second, the presence of a local volatility layer alongside a stochastic volatility part induces 
a decomposition of the option volatility into spanned and unspanned parts, rather than being 
of a pure unspanned type as in the CGW model. One could expect that such decomposition 
of volatility should translate into a decomposition of an option's vega into a delta- vega and 
a "genuine vega" part. 

Because of the way our model is calibrated, it enables traders to incorporate their view on 
the relative weights of the spanned and unspanned parts in the option's vegasj^ By viewing a 
trader's inputs as a prior model that does not necessarily match observed options, our model 
finds a minimal adjustment ("tweak") to the trader's prior model in order to reinforce an 
accurate match of the option quotes. 

In contrast, the volatility in local volatility models would be 100% spanned. In local 
volatility models, matching vanilla pricing would fix the volatility surface for all strikes and 
maturities, and would not leave any flexibility for the model to match prices of more exotic 
options. The inclusion of stochastic volatility allows one to simultaneously have more realistic 
forward smile dynamics and additional parameters to match exotics' prices (if available). The 
ability of USLV to incorporate a possible trader's view is what sets it apart from both pure 
local volatility models and pure SV models of the CGW type. 

On the implementation side, USLV concentrates on the most important low-dimensional 
specifications for practicality, e.g., two factors for the term structure (with N curve factors 
in general), and one or two factors for stochastic volatility (with M volatility factors in 
general). In particular, for a (2+2)-factor case, we show how to approximate the dynamics 
of the driving factors by a two-dimensional Markov chain on a space constructed by folding 
(see below) of the original four-dimensional state space. This enables fast pricing by standard 
backward induction on the chain. 

It should be noted that while in this paper we concentrate on modeling term-structure 
dynamics (e.g., of futures, swap rates or credit spreads) with potential applications to "term 
structure asset classes," such as IR, commodities or credit, the same approach can be used for 
modeling spot prices, which would be a proper setting for "spot asset classes," such as equities 
or FX. Moreover, due to a symmetric treatment of "term-structure assets" and "spot assets" 
in the present framework, this approach is readily available for modeling hybrid derivative 
products (e.g., equity-IR or equity-commodity hybrids) using the same implementation. 
Changes from one asset class to another would amount to a proper reparametrization and 
reinterpretation of the Markov generator matrices while leaving the computational algorithm 
intact J^] Furthermore, in the continuous limit, different parametrizations of the stochastic 

4 Note that while this property of USLV is shared by local stochastic volatility models as well, the key 
point here is that now we have an additional risk factor (volatility) to acknowledge, model and hedge. 

5 Tcchnically, this is done by giving the end user the ability to input his/her own set of speed factors (SF), 
see below. 

6 In principle, this could produce a generic pricing engine, similar in a sense to Monte Carlo (MC). Indeed, 
the latter method is a "universal" method of derivatives pricing in the sense that in this framework, we only 
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volatility generator in our Markov chain model give rise to a rich class of (2+2)-factor 
models, including stochastic local volatility with jumps. Note that in this paper we primarily 
concentrate on specifications whose continuous limit is a two-dimensional diffusion with a 
two-dimensional diffusive stochastic local volatility This case is covered in detail below in 
Sect. [6} However, in Sect. [7j we will present an alternative formulation that can give rise to 
jumps in both the underlying and stochastic volatility. Our approach is thus quite flexible 
in its ability to accommodate different specifications of the dynamics, including a four-factor 
stochastic local volatility model with jumps. 

2.1 USLV vs. HJM vs. Affine Models 

Our initial interest in using LGP-type models for modeling stochastic volatility was inspired 
by the observation that LGP-type models (and, by extension, USLV-like models) seem to 
combine the best features of both HJM-type models and affine models, while avoiding their 
disadvantages. Indeed, like the HJM-type models, the stochastic volatility is unspanned 
in USLV. Unlike the HJM-types, the model is Markov in dimension N + M rather than 
N + N(N + l)/2 + M, as in HJM-type models. Conversely, both affine models and USLV 
have the same number of state variables (N + M). However, in USLV, volatility is always 
(partly) unspanned, while in affine models, volatility in general will be spanned unless some 
special constraints are imposed on parameters, which might be restrictive for calibration 
purposes. (See also Table [I] below.) 

The above reasoning suggests that if we manage to generalize the pure stochastic volatility 
model of CGW to a stochastic local volatility model (i.e., to make a USLV out of CGW), and 
do it in a numerically efficient way, and if the resulting model demonstrates good parameters 
and hedges stability etc., then such a model can be considered a viable candidate for use 
in practice. This paper outlines the theoretical framework for USLV, leaving numerical 
experiments for future work. 

A few more words of caution are in order here. Our outline of the USLV is generic and is 
not tied yet to any specific asset class. Each asset class makes its own demands on a model. 
For example, the ability to reproduce the Samuelson effect and asset cointegration are very 
important for commodities, alongside the ability to handle seasonality in asset levels and 
volatilities for certain commodities, such as gas or power. It has yet to be seen how (or 
whether) the USLV framework can accommodate such specific requirements. A discussion 
of this matter is planned for the second stage of the present theoretical work. 

A brief summary of different model classes is presented in Table [TJ where we compare 
the behavior of equity stochastic volatility models such as the Heston model, HJM-type, 
affine-type and LGP/CGW/USLV-type. The third column shows the functional form of 

need to implement dynamic equations and payoff functions for a particular model-product combination 
in order to use a generic MC engine. Likewise, our Markov chain framework is "universal" in the same 
sense (within a class of all diffusive local stochastic volatility models in up to (2+2) dimensions). The 
only difference here is that while in MC we typically start with continuous space dynamics, which is then 
discretized for simulation through discretization of processes (e.g., Brownian motions) driving the dynamics, 
the dynamics in our approach are fundamentally defined in terms of discretized state variables. 
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Table 1: Model comparison summary. Note that "Yes" in column USV for IR HJM means 
"in general, yes," and likewise "No" for IR Affine means "in general, no, unless special 'knife- 
edge' constraints are imposed on parameters of the model." Here St, Pi and Y t stand for the 
stock and bond prices and volatility factor, respectively, while BI t and BV t stand for basic 
instruments and basic variables, respectively. Finally, D stands for for the total number of 
state variables needed for a Markovian description. 

conditional expectations arising in calculation of prices of elementary instruments. 



3 The Carr-Gabaix-Wu Model 



In this section, we provide a brief overview of the CGW model of Carr et al. (2011). The 



CWG model is then used as the first step in our setting. Simultaneously, in this section we 



set our notation, on which we largely follow Carr et al. (2011). 



3.1 State-Price Processes and Martingale Pricing 

The famous fundamental theorem of asset pricing (Harrison & Pliska ( 1981[ )) states that if 
the economy is arbitrage free, then there exists (under certain technical conditions such as 
positivity and time consistence) a strictly positive process M t called the state space deflator, 
such that the deflated gain process associated with any admissible trading strategy is a 
martingale under the measure P. In particular, for a contingent payoff at time T > t, its 
value at time t is given by the following P-conditional expectation: 



V(t,T)=E t 



M 7 
~Mt 



-Yin 



(1) 



The ratio Mx/Mt is sometimes referred to as the stochastic discount factor or the pricing 
kernel. The P-measure SDE for M t reads 



dM t 
~Ml. 



-rdt - \(t, Z t )dZ t , 



where Z t is a vector of risk factors and X(t, Z t ) measures the market prices of risk for these 
factors. The formal solution to this SDE takes a multiplicative form 



}[, ----- M,)cx\) [ - I r s ds ] S 



X(t, Z s )dZ s , 



6 



where £(■) stands for the stochastic exponential martingale operator. The latter defines the 
Radon- Nikodym derivative dQ/dP that transforms the physical measure P to the risk-neutral 
measure Q such that, under Q, the contingent claim valuation reads 



V(t,T)=El 



exp 



r„ds I II 



(2) 



3.2 One- Factor Case 



Our starting point is a version of one-factor LGP dynamics suggested in Bekker &: Bouwman 
(2009). It differs from Carr et al. (2011) by i) parametrization, and ii) derivation and inter- 
pretation of the main result of the LGP approach (see Eq.([9]) below). The two formulations 
can, however, be mapped onto one another by a proper re-parametrization. 

Assume that a state variable X t is driven by the following SDE under measure P: 

dX t = r t X t dt + a(t, X t ) [X(t, X t )dt + dW t ] , (3) 

where Wt is the standard Brownian motion and the short rate r t is an affine function of X t : 

r t = a- ce'^Xt = a- cX t (4) 

where X t = e~ tit X t is a detrended state variable with \i being the growth rate of X t , and 
a > and c > are two constant^] 

The P-measure SDE for the state price deflator M t reads 

dM t 



-r t dt - X(t, X t )dW t 



(5) 



The representation of stochastic dynamics given in Eq.Q and Eq.Q is convenient as both 
the real world (measure P) and risk-neutral (measure Q) dynamics can be directly read from 
these equations provided we know functions a(t,X t ) and X(t,X t ). Indeed, by the Girsanov 
theorem, the combination dW t + X(t, X t )dt is a Q-martingale, so that the Q-measure drift of 
X t is the short rate r t . This means that the state variable X t can be interpreted economically 
as a risky (tradable or non-tradable) asset, or be related to aggregate market returns, see 
Bekker fe Bouwman| fl20~0~9~] ) . 

Using Eq.([5j) and the fact that M t X t is a martingale under measure P (which is easy 
to verify using Ito's lemma and Eq.([3]) and Eq.([5])), we compute the conditional time-t 
expectation of Mt as follow^] 

T 



E t [Mi 



E t 



Mt 



r^Mcds 



X{s,X s )M s dW s 



M t - / aE t [M s ]ds+ / ce-^Et [M S X S ] ds 



T 



(6) 



Mi 



aE t [M s ] ds + 



ce-^ds M t X, 



7 Note that we can set c = 1 without any loss of generality. However, we prefer to keep it in the formulae 
for correct dimensionality, i.e. c has the dimensionality of r t divided by that of X t . 
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All conditional expectations E t [•] in this paper refer to measure P unless specifically stated otherwise. 
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This produces the following result for the time-t zero-coupon bond price 



P(t,T) 



Et [Mi 



J aP(t, s)ds + (J 



ce'^ds X, 



(7) 



Differentiating this relation with respect to T, we transform it into a differential equation: 

d ^fl = -aP(t,T) + ce^ T X t (8[ 



whose solution reads (Bekker & Bouwman (2009)) 



P(t,T) 



1 + ce 



-fit 



a — /x 



1 + c X t 

a — LL 



t = T — t 



(9) 



Note that this expression does not depend on specification of functions a(t, X t ) and A(i, X t ) 
in Eq.(|3]), and the shape of the yield curve at time t is driven by the current value of X t . 
This is different from affine models where the bond price is represented as a conditional 
expectation of a non-linear functional of future values of the state variables. 
The solution of Eq.pl) in the zero volatility limit a(t,X t ) = reads 







C l—rjoe 

Vo + ct ' 



-(a — fi)t 5 



Vo 



1 _ 

cX ■ 

1 

X ' 



if [A ^ a, 
if /x = a 



(10) 



In what follows, we concentrate on the case \i < a which corresponds to the scenario where 
the detrended asset X t = e~ flt X t reaches a constant level Xoo = ^— > as t — > oo. In 
addition, we assume that X > so that < i] < 1 which produces an increasing and 
concave yield curve, while negative values of rjo produce inverted yield curves, see Fig. [TJ 
In this scenario, the short rate r t approaches \x as t — > oo. If t]q satisfies a more restrictive 
constraint r] Q < fi/a, then the short rate r t stays positive (or equivalently P(t,T) < 1) for 
all t. Therefore, we assume a slightly more general constraint on t]q by introducing another 
parameter JJl that lies in the interval [/i, a] : 



o < m < - , 

a 



[i < /j < a 



For such choice of r/ , Eq.(10) with /x < a produces a concave and monotonically increasing 

-ows: 

(12) 



yield curve, while the short rate r t and state process X t are bounded at all times as follows: 
a — /x a — u . ~_ . a — u 1 



a — 



(/x/a) e-( a -^)* 



(/x/a) e-( a -^)* 



where the low and upper boundaries for r t (respectively, upper and lower boundaries for X t ) 
are attained by setting r] = Jx/a or r] = 0, respectively. One sees that parameter JJl can be 
used to control how much the short rate r t can go negative. If we set /x = /x, the short rate 
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Figure 1: Yield curves obtained with a one-factor specification Eq.(|9j) with t — for different 
combinations of parameters a, fi and rjo, as a function of maturity in years. Parameter c 
equals 0.1 for all curves. 



r t is non-negative at all times t > 0, but as we will see shortly such choice for ft may not 
necessarily be optimal in the present framework. 

Now we return to Eq. (J3]) with non- vanishing volatility, and consider the following ansatz 
forX i: 

, ji < a (13) 



a — fi 



c 1 - Yte-^-^ 1 



which is obtained by replacing a constant r/ in Eq.( 10 ) by a stochastic process Y t restricted to 



the strip [0, fi/a]. Such a process with a given initial value < Y < n/a can be constructed 
as follows: 

Yt = ^ — (14) 

aY + {Ji — aY )Z t 

where Z t > is a non-negative martingale under measure P that starts at Zq — 1 and 
satisfies the following SDE: 



dZ^ 



-a(t,Z t )dW t 



(15) 



where a(t, Z t ) is a local volatility function that will be specified later. The initial value X 
is expressed in terms of Y as follows: 



a — ji 



1-Fn 



(16) 
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Substituting Eq.(14) into Eq.(13), we obtain 
where 



a — yU 



e"* [aY + (J1- aY ) Z t 



(a + pZ t 



c aY [1 - (n/^e-^-^ 1 } +(fi- aY ) Z t c an{t) + Q Z t 



a 



aY , (3 = fi — a , n(t) = 1 



a 



(17) 



(1* 



Eq.(17) defines X t as a fractional-linear function^] of the martingale process Z t determined 
by Eq.(15). Clearly, the process X t defined by Eq.(|i~7"|) is bounded, as it should be as long 
as it is related by a linear equation Eq. ^ to the bond price which is boundec 10 By taking 
the limits Z t — » and Z t — > oo, one obtains bounds for X t : 



a -^<x t < a -^ 



1 - (^/a)e-( a -^)* 



(19) 



which coincides with Eq.(12). Note that Carr et al. (2011) obtain a fractional-linear ex- 



pression for X t as a function of Z t that is similar to our Eq.(17), however their choice of 
time-dependent coefficients is different. 



Using Ito's lemma, we find that X t defined by Eq.(17) satisfies Eq.([3]) provided we choose 
the following specification for functions a(t,X t ) and \(t,X t ) (which we express here as 
functions of Z t rather than X t ): 



Ht,x t ] 



aa(t, Z t 



a(t, Z t 



fi a 



c [ a «(t) + PZt 



0Z t 



(20) 



Note that volatility a(t,X t ) vanishes in both limits Z t — > or Z t — > oo which correspond to 
boundaries in the X-space given by Eq.(19). Finally, the solution of Eq.([5]) reads 

[an(t) + 0Z t ] 



(21) 



a + (3 

where we normalize M t to enforce the constraint Mo = 1. Therefore, the state price deflator 
M t is a linear functional of the Markov driver Z t . Linearity of this expression is exactly the 
reason we do not have any convexity corrections in Eq.(|9]). For any nonlinear functional a 
resulting expression for P(t,T) would depend on volatility of Z t . Such behavior is inten- 



tionally avoided in the present framework. Further note that using Eq.(21), Eq.(17) can be 
re-written as follows: 

a-H a + (3Z t 
c{a + f3) M t { ' 



Xt 



9 This function is also known as a Mobius transformation in complex analysis. 

10 One can notice here a certain conceptual similarity between the LGP and Markov functional models on 
the one hand, and a difference with the affine models on the other hand. For the latter, the driving SDE has 
afhne drift and diffusion coefficients, while the function f(X tl Y t ) (see TableJIJ is nonlinear (nonaffine). For 
the LGP-type models, e.g. the CGW model, the situation is reversed: the state equation is now nonaffine 
but the pricing equation is affine. 
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Therefore, for contingent payoffs that are linear in Xt, the conditional expectation in 
Eq. ([!]) reduces to the expectation of a linear functional of Zjp] This property of the model 
can be used, in particular, in order to reduce the swaption pricing to a single option on a 



future value Z T , see Carr et al. (2011) for more details 



Note that as t — > oo, the detrended process X t = e ^ l X t with X t defined by Eq.(17) 
converges to the following equilibrium value: 

X 00 = a -^ (23) 

c 

irrespective of the behavior of the process Z t . In other words, stochasticity dies off in the 
one-factor LGP model, while the onset of this asymptotic regime is attained at times t where 
n(t) approaches unity: 

£ e -(«-A0t i t > _J_ (24) 
a a — fi 

The resulting asymptotic extinction of volatility of the process X t as t — > oo appears to be a 
theoretical limitation of a one-factor LGP framework. This behavior is illustrated in Fig. [2] 
where we show a few paths of a simulated short rate process. As will be shown in the next 
section, a two-factor formulation allows one to build dynamics where stochasticity does not 
die off as t — > oo. Another limitation of the one-factor formulation is that it is only able to 
produce monotonic (increasing or decreasing) yield curves. In order to get a humped-shaped 
yield curve, one needs at least a two factor specification. 

On the practical side, the effect of decaying stochasticity of X t in the one-factor formu- 
lation can be somewhat mitigated by a suitable choice of parameters. In particular, if the 
yield curve is not too steep, by taking p, > fi, we can admit some negative short rates in the 
short term while guaranteeing that r t stays positive for longer maturities, as shown in Fig. [2] 
where r t can go negative for t < 2 years. Such simple trick allows one to somewhat delay 
shrinkage of the support region of r t for larger time horizons. Therefore, we expect that the 
one-factor version of the model can still be used for monotonic yield curves with sufficiently 
short time horizons. 



3.3 Extension to a Multi-Factor Case 



The previous formulation treats a one-factor case N — 1. For an arbitrary number of factors 



N > 1, the state vector ~X. t = (X 



it, 



, X^t) satisfies the following equation: 



dX 4 = r t X t dt + a(t, X t ) [\{t, X t )dt + dW 



(25) 



where W t is a iV-dimensional standard Brownian motion and the short rate r t is an affine 
function of X t : 



N N 

c ^2 e'^Xu = a-c 



(26) 



8=1 



i=l 
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In addition, Eq.(22| shows self-consistency of our approach: while we know from Eq.(|3| and Eq.([5|) that 
the product M t X t is a martingale, Eq.(22| shows that this product is proportional to an affine function 
a + pZt of a martingale Z t , which is a martingale. 
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Figure 2: Simulated paths of the short rate r t as a function of calendar time t (in years) 
for the one-factor LGP process Eq.(17), for the following choice of parameters: a = 0.085, 
fi = 0.080, ft = 0.081, Y = 0.909 and a = 0.6. 



where Xa = e~^ lt Xi t are detrended state variables with being the corresponding growth 
rates, and a > and c > are non-negative constants. The P-measure SDE for the state 
price deflator M t reads 

^ = -r t dt-X(t,X t )dW t (27) 



The solution to this equation generalizes Eq.(|21|): 



YZl [<XiKi(t)+PiZ t 



E l =i(«i + A) 



(28) 



where parameters a^, /?, and functions Ki(t) with i — 1, . . . , N are defined similarly to Eq.( 18). 



Finally, the solution for the bond price in a iV-factor model generalizes Eq.([9]) (Bekker & 



Bouwman (2009)): 



P(t,T) 



N 
i=l 



p (a-m)T _ i 
a — 



it 



N 



1 + c^^ -X lt 



i=i 



a — /ij 



(29) 



where r — T — t. As can be seen from Eq.(29), only the factor with the smallest value of 
contributes to this expression in the limit T — > oo. In our analysis below we set X = 2 
and assume that fi2 < Hi- Therefore, we interpret the second factor X2t as a long term 
factor, while X u serves as a short-term factor that drives, jointly with X 2 t, the short-term 
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part of the yield curve. A two-factor specification gives rise to a richer set of yield curves 
than a one-factor one, including, in particular, humped curves or curves with sign-changing 
convexity which could not be obtained with a one-factor specification. 

Another key difference of a multi-factor case from a one-factor one is that volatility 
does not necessarily die off in the long run. We can illustrate this for N = 2. Proceeding 



similarly to the one-factor case, we first solve the system Eq.(25) without noise, by setting 
<r(t,X t ) = 0. We obtain 



X 



[I! e -(Mi-a)* 



It 



U 2 — a e ~(M2-a)t 

X 2t = Vo - eo _ f7oe _ (w _ a)t + e _ (w _ fl)t (30) 

where 770 and £0 are two constants that are determined by initial conditions. In what follows, 
we assume that /ii > a and a < //2 < as suggested by a numerical example below. For 
such a scenario, the following constraints on parameters 770 and £o : 

< 770 < 1 , £o>l (31) 



guarantee that Eq.(30) is well defined at all times. Additional constraints on parameters rjo 
and £0 arise if we enforce positivity of the short rate r t = a — cX\ t — cX 2 t- We will omit 
details that can be reproduced using similar steps to the above one-factor case, and instead 
concentrate on showing that volatility in a two-factor (or, more generally, a multi-factor) 
LGP-type model does not in general die off in the long term. 

To this end, we note that a solution of a full N = 2 model with non-vanishing volatility 
cr(t, Xt) 7^ can be obtained along similar lines to the previous N = 1 case. More specifically, 



we should replace constants £ and r/ in the deterministic solution Eq.(30) by judiciously 



chosen fractional-linear functions of a two-dimensional martingale process Z t = (Z lt ,Z 2t ). 



The latter satisfies a two-factor generalization of Eq.(15), and has the initial value Zq = (1, 1 



On the other hand, assuming that \i\ > a and a < \i 2 < /ii, the asymptotic behavior of the 



deterministic solution Eq.(30) as t — > 00 is as follows: 



X lt ^ile-O*-)* , X 2t ^^5» e -(«-)* (32) 

This suggests that volatility of factors X u X 2 t does not necessarily get extinct in the long 
run, unlike the behavior observed in the one- factor case. An example of a calibrated yield 
curve is shown in Fig. [3j 



4 The USLV Model 



Until this point, our formalism has largely followed the CGW model by Carr et aLl ( 2011 ) 
(albeit using the formulation of LGP dynamics proposed by Bekker & Bouwman (2009)). 
Now it is time to part ways. 
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Yield curve fit in USLV(2,0) 




10 15 20 

maturity 



Figure 3: Calibrated yield curve (as a function of maturity in years) on 03/15/13 in the two 
factor model. Calibrated parameters are as follows: a = 0.041, = 0.520, = 0.382, X\ = 
-0.151, X 2 = 0.188. 



CGW investigate a parametric stochastic volatility (SV) specification of the dynamics of 
Z t . In that specification, the stock volatility is obtained by a stochastic time change with 
an activity rate process given by a superposition of CIR processes. 

Our plan is different. We want to stick to simple one- or two- factor specifications of 
the stochastic volatility process, while concentrating on modeling a nonparametric local 



volatility layer in Eq. ( 15 ) in such a way that all observable option quotes would be exactly 
matched. Furthermore, our approach is necessarily numerical and is based on a Markov 
chain approximation to the dynamics of the martingale Z t . 

We will construct our model in two steps. In the first step, we develop a discretized 
nonparametric local volatility version of USLV that corresponds to a zero vol-of-vol limit of 
the full-blown model. Calibration to option quotes in this framework is achieved via a set of 
multiplicative adjustment factors acting on elements of the Markov generator (see below). 
We will refer to these adjustment factors as the one-dimensional (ID) Speed Factors (SFs). 
Calibration of such a one-dimensional USLV model amounts to computing a set of ID SFs. 

In the second step, we move on to a full-blown USLV model with a nonzero vol-of-vol by 
turning stochastic volatility on. In the present discrete-space framework, this amounts to 
making the Markov chain generator stochastic. 

To retain a near-perfect calibration to a set of option quotes obtained in the first (ID) 
step, we introduce another set of speed factors which we will refer to as 2D speed factors 
(2D SFs). The 2D SFs are then calibrated from the previously computed ID SFs using 
a version of the Markovian projection method implemented in an efficient manner using 
forward induction on the Markov chain. Once calibrated, the resulting 2D Markov chain can 
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be utilized to set up efficient pricing schemes for derivatives based on backward-induction 
algorithms. 

We note that the resulting discrete-space continuous-time dynamics on a Markov chain 
with a stochastic generator arising in our approach resembles the BSLP model developed 



for portfolio credit derivatives in Arnsdorf & Halperin (2007). The two models are similar 



in that both use a two-step approach to calibration. In addition, both the definition and 
parametrization of our speed factors are similar to how analogously defined contagion factors 
are introduced and used in the BSLP model. The difference of the USLV model from the 



BSLP model is that while a discrete-space description is exact for credit.^ it is used as an 
approximation to the dynamics of the underlying for USLV. Furthermore, while the BSLP 
model uses a non-linear death process as a model for a portfolio loss, in the present case we 
use a nonlinear quasi-birth- death (QBD) process as a discrete approximation to the dynamics 

of a two-dimensional Markov driver Z t = [z^\ zf^ . Finally, we use a different method 
for an efficient computation of matrix exponentials arising in evaluation probabilities on 
Markov chains. 

In what follows, we use the following compact notation for different flavors of our model. 
We denote one- and two-factor discretized local volatility models as USLV(1,0) and USLV(2,0), 
respectively. Versions with stochastic volatility are denoted as USLV(1,1), (2,1), or (2,2). 

We call a discretized process for Z t = \Z^\ Zj®j a ID process, while the joint process of 
Z t and stochastic volatility is referred to as a 2D process. 

4.1 USLV(1,0): One-Factor Local Volatility 

We consider the following SDE describing a local volatility dynamics of a one-dimensional 
Markov driver Z t under the P-measure: 

i rv 

— ^ = a(Z t ,t)dW t , (33) 
where Wt is a Brownian motion and a(Z t , t) is a local volatility. Our objective is to discretize 



the dynamics corresponding to Eq.(33). 



To this end, we first construct a nonuniform grid of possible values of the martingale 
Z t > with Zq = 1. Let p be the number of points on the grid and < Zq < Z\ < ■ ■ ■ < z v _\ 
be the nodes on the grid. Irregularity of the grid allows making it denser in interesting 
regions, and sparser in uninteresting ones. Clearly, the fact that our process is a martingale 
helps as our grid should not be too large: as time passes, the underlying stays around the 
current value in the sense of expectations. 

Let the current state be Zi and let Az^-i = Z{ — i G [l,p — 1] be the ith space 
interval on the grid. We construct the elements of the generator matrix A t following the 



adaptive Markov chain approximation of Cerrato et al. (2011). Adapting their formulae to 



12 Provided some additional assumptions are made, such as a discrete spectrum of recovery values. 
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our case of a zero-drift diffusion Z tl we obtain the Markov chain generator 



/ ctoo a oi 

Clio Cbll Q-Y2 



A 








with elements 








Si{t) 



a p-2,p-3 0-p-2,p-2 a p-2,p-l 
<2p-l,p-2 a p -l,p-l J 



(34) 



St(t) 



—a 



H,i-\-l j 



0, j > 1. 



(35) 



where we defined a grid-valued set of speed factors (SFs) 

Si {t) =6- 2 { Zl ,t). (36) 
Now we have a Markov generator parametrized by the pointwise set of speed factors in 



Eq.(36). Next we will show how the SFs in Eq.(36) are turned into tunable parameters and 



used for calibration to option quotes. 



4.2 Parametrization of Speed Factors in USLV(1,0) 



As it stands, the parametrization in Eq.(34) is very general. The number of free parameters 



per a given time slice t is p, typically far exceeding the number of observed option quotes 
available for calibration for cases of practical interest, 
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To achieve an exact match between the number of free parameters in our model and 
the number of available option quotes, we consider the following parametrization of our ID 



speed factors Eq.(36). Our approach here is similar to how contagion factors are used in the 



BSLP model of Arnsdorf & Halperin (2007). 



We assume that as a function of time t, the SFs Si(t) are piecewise constant between ma- 
turities of traded options. This considerably simplifies computation of matrix exponentials. 

For the dependence of Sj(t) on the grid index i (i.e., for the z dependence), we proceed 
as follows. Let K , K x , . . . , K g _i be a set of strikes for traded options across all maturities, 



expressed in terms of the Z-space as, e.g., in Proposition 3 of Carr et al. (2011). We assume 
that all these strikes correspond to q different nodes on our gridj 14 | We then model the 
speed factors Sj for all values of i by picking exactly q free values §q, . . . , s q -\ at locations 
K Q , Ki, . . . , K q _i, and using linear interpolation for points in between. In other words, our 
speed factors Sj(i) are piecewise-linear in z i5 while the anchor points at q selected nodes serve 
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Typical values of p that we have in mind for practical implementation are around 20 to 100; see also 



Cerrato et al. (2011). 

i4 This is because our grid is constructed in such a way that it puts all strikes exactly at some nodes, plus 
add some nodes in between and beyond the range of quoted strikes. 
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as free parameters for calibration to option quotes. Any consistent set of option quotes can 
be exactly matched by the present method]^] 

Note that calibration of local volatility models without assuming availability of a complete 
set of option quotes (i.e., when the number of option quotes is less then the number of nodes 
on a grid) has been previously discussed in the literature. In particular, a recent paper by 



Lipton & Sepp (2011) analyzes a setting where the local volatility function is piecewise-flat 



between quoted strikes (or between mid-points) using Laplace-transform based methods. 
Unlike their method, which is exact in ID, our approach is based on numerical optimization, 
but it is extendable to a multivariate setting (2D and higher) along the same lines as in ID. 
In addition, a piecewise-linear volatility model appears to be a less drastic approximation 
than a piecewise-flat model. 



4.3 Calibration of the USLV(1,0) Model 

Calibration of the speed factors Sj(t) in the above setting is straightforward. The anchor 
points introduced above serve as parameters of optimization. Given a multidimensional op- 
timizer, at each iteration we first construct the generator matrix given the current set of the 
anchor points. After that, finite-time probability distributions are computed by taking ma- 
trix exponentials of the generator. As the mathematical structure of our model is essentially 
the same for the N = 1 and N = 2 cases (one or two factors for the term structure), we 
postpone presenting details of this procedure until the next section where we introduce an 
N = 2 version of our model. Theoretical option prices for a given set of model parameters 
are then computed using these probability distributions. Finally the optimizer adjusts the 
current set of free parameters to decrease the error between the model and the market. 



5 USLV(2,0): Two Curve Factors 

With two factors for the curve (N = 2), we assume the following vector- valued SDE for the 
dynamics of Z t = (Z^,Z^) T : 

( dZ? \_ ( E U S 12 \ ( dW t W \ 

{ dZ? ) ~ V S 22 ) [ dW ^ ) ' (3?) 

where two Brownian motions are independent, and the volatility matrix £ = S(z) 

is defined as follows: 

= V^J) ( ) , S(z)E(zf = S (M) ( ;l "jf ) . (38) 

15 Alternatively, if the number of liquid option quotes per maturity is large while the calibration speed is 
an issue, then some strikes can be omitted from the set of anchor points at the price of giving up an exact 
calibration for these omitted strikes, while keeping an exact calibration for the other strikes. For example, 
one can be guided by the size of quoted bid-ask spreads in deciding which strikes could be skipped without 
much sacrifice to accuracy while gaining in performance. To preserve no-arbitrage for the omitted strikes, 
one should use a monotonic interpolation in the probability space. 



17 



Here s(z, t) = s(Z 4 , t\Z t = z) > is a scalar function of the state variable Z t = ( , Z\ j , 
and z = (zi, z 2 ) are the values of Z t at time t. An explicit specification of this function will 
be given below 

correlated with correlation coefficient p 



Note that components Zf^ and zj® defined by Eq.(|37l) and Eq.(|38l) are 



Our first objective is to approximate the dynamics given by Eq.(37) by a 2D Markov 



given by Eq.(37): 



chain. To this end, we start with a Markov generator corresponding to the 2D diffusion 



»J=1 



(39) 



where Viz) = V(zt\z) is an arbitrary function (a value function or a transition density) 
of backward variables z (with forward variables zt treated as parameters). The generator 
specifies the continuous-space backward equation 



dV(z,t) 
dt 



-CV (z,t) 



Note that in order to be probabilistically interpretable as a generator of a Markov chain, 
a discrete version A of the generator C should have all nondiagonal elements nonnegative, 
and all diagonal elements negative, such that all rows sum up to one. These remarks are 
important as not any discretization of £ gives rise to a valid Markov chain generator. For 
example, using a central divided difference to approximate the mixed derivatives in Eq.(|39| 
would not preserve nonnegativity of nondiagonal elements of A. 

With these remarks in mind, and given a two-dimensional eric ^ of values of {Z^\Z^) 
with p nodes per dimension, we approximate second derivatives by divided differences. 
Derivatives V ZkZk , k = 1,2, are represented using the central differences 



d 2 V 



d 2 v 



2K: 



I -J 



Vi-i d 



Vi 



(A^) 2 



+ 0((A, 1 ) 2 ) 
+ 0{(Az 2 ) 2 ) 



1.1 



(Az 2 ) 2 

while for the mixed derivative we take uncentered differences which preserve nonnegativity: 
d 2 V - V i+ltj - (V i>j+1 - 2V l3 + Vij-t) - (K-ij - K-ij-i) 



dzidz 2 



'.1 



2Az x Az 2 



+ O {A Zl Az 2 ) + O ({A Zl ) 2 ) + O {{Az 2 f) , p > 0, 
d 2 V V i+hj - Vi+u-! + (V iJ+1 - 2V tJ + Vij-j) - - Vi- ltj ) 



dz\dz 2 



ij 



2AziAz 2 

O (A Zl Az 2 ) + O {{A Zl ) 2 ) + O {{Az 2 ) 2 ) , p < 0. 



id 



For simplicity, in this section we assume that our one-dimensional grids are uniform with the same 



number p of grid points per each dimension. Therefore, z. 



8z2, Vi = 1, N, j € [!,£>)• For analysis of a nonuniform grid, see Appendix [A[ 



Szi, Vj = 1,N, i G [l,p), Zi tj+ x- 
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Using this in Eq.(39) and regrouping terms, we obtain 



fc,mG{-l,0,l} 



where we introduced the following compact notation: 



- I a (A ( ^ |/P|0l02 

a ij\i,j+l ~~ a «il«J-l ~~ n S ij\ l ) I / » n2 



2 JW V(A2 2 ) 2 

*iM = ( ^2 " + ^2 ) > (ID 



2o ' ' 



if p < 0, 

if p > 

2 AziAz 2 



where s y (t) = [s(Z t ,t)] ir 

To ensure that all parameters aij\i + k,j+ m , k, m ^ are nonnegative, we impose the 
following constraint on the step size Az2 given a chosen step Azi: 

A*i < As 2 < y^-Azi. (42) 



Assuming Eq.(42) is satisfied, we can interpret Eq.(40) as a generator matrix of a 2D Markov 
chain. We can write it in a matrix form as follows: 

{CV{z)) l0 = [AV] l3 = Awf^, (43) 

i'J' 

As we deal with a two-factor setting, the matrix elements of the generator A carry four 
indices rather than two. To sum over two indices corresponding to the Z^- and Z^-states 
in Eq.(43), it is convenient to group all transitions according to the change of one variable 
(e.g., ZW). We obtain 

(CV(z)) i:j = ^ Ajj^fVjrf = ^ Ajjii^Vi-iji I ^ A;, :y\',y + /, ] Aj\i+lJ'Vi+lJ' 
i'J' j' j' j' 

= [{aijii-xj-iVi-ij-i + aij\i-ijVi-ij + ajj|j_ij + iT^_ij + i| 

+ \ a ij\i,j-lVi,j-l + a ii\ijVij + a «i|«,j+i^i,i+i} (44) 
+ \Q>ij\i+i,j-i Vj+ij-i + aij\i + ijVi + ij + aij\i + ij + iVi + ij + i}~\ . 

Here terms in the first, second, and third row correspond to transitions i — >■ i — 1, i — )■ i, and 
z — M + 1 in the Z^-dimension, respectively. 
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Mathematically, this is expressed via the following tridiagonal block-matrix form for the 
resulting "one-dimensional" Markov chain generator A: 



A 



( m 

£>(D 





F {0) o 

5 (2) L (2) 






i?(2) 















V 







(45) 



where all matrices L^ l \ have dimension p x p, i.e., the dimension of our one- 
dimensional grids j^] Explicit expressions for these matrices can be found from Eq.(44): 



n 3,3-1 

L 3,3-1 
f(0 



»i[i-lj-l> ~~ a ij\i-t,ji n j,j+l 



B 



(0 



a 



a 



(46) 



(0 



while all other elements of these matrices vanish. Note that this implies that the generator 
Eq.(45) is "doubly" sparse, as matrices B^\ and F^ l > are themselves sparse; see also a 



comment at the end of this section. 



The block-tridiagonal matrix structure Eq.(45) of the Markov chain generator A is char- 



acteristic of so-called quasi-birth-death (QBD) processes. A QBD process is a bivariate 
Markov chain of a special type of dynamics of two components. The first component, Zf~\ 
called the "level," follows a birth-and-death (BD) process on either a finite or infinite set 
of states. Conditional on the realization of the Zj^-component at a given step [t,t + dt], 

(2) 

the second component Z\ , called the "phase," follow s another Markov process. For a short 

(2) 



Kharoufeh 



(2011). Note that in our particular case, Z\ 



review of QBD processes, see, e.g. 
follows another BD process, while the support of Z^ is finite. QBD processes with finite 
support are called finite QBD processes. 



The symbols L, B and F in Eq.(45) stand for local (without change of level), backward 
and forward (the level is changed by one unit up or down) moves, respectively. Note that as 
long as the matrices L^\B^ and depend on level i via the discretized local volatility 
function sy, our QBD process with generator Eq.(45) is a level- dependent (or nonlinear) 



finite QBD, denoted sometimes as a (finite) LDQBD in the literature. 



It is easy to check that Eq.(45) with elements defined as in Eq.(41) is a valid Markov 



generator where all off-diagonal elements are positive, diagonal elements are negative and 
each row in A sums to zero. 



After the QBD generator matrix is constructed according to Eq.(45), a matrix P of 
finite-time transition probabilities with matrix elements 



Pij\i'j' (t,T) = P [Z T \ Z T ^ — z\, ; , } \Z\ \ Z 



,(1) J2)| 7 (l) y(2) 



M) -(2) 



(47) 



17 If grids in z^ and z^ have different lengths p\ and P2, then the size of these matrices will be P2 x P2- 
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can be computed by solving the forward Kolmogorov equation 



dP 
df 



PA. 



(4£ 



For a given interval [£1,^2] where the generator A does not depend on time, the solution of 
the forward equation is 

P = P tl e&-^ A , 

where P tl is a state vector at time t±, and e x stands for the matrix exponential of X. 

A few remarks on the complexity of the method just presented are in order here. We 



have managed to map the two-factor continuous-space dynamics Eq.(37) on the state space 



Z\ x Z 2 onto a QBD process with generator Eq.(45). The latter can formally be viewed 



as a one- dimensional Markov chain in an extended linear space whose basis is formed by 
elements of a Kroneker product of grid values Zjp ® (and properly rearranged to form a 
QBD structure). Therefore, computation of transition probabilities Eq.(48 ) in our two-factor 



model is, at least formally, as simple as a corresponding calculation for a one-factor model, 
and reduces to computation of a single matrix exponential, albeit of a larger matrix. 



is 



While naively the generator A has 0(p 4 ) free parameters, their actual number is much 
lower due to sparsity of the matrix. It is simple to find that the number of nonzero elements 
that need to be stored scales as (3p — 2)p + 2(2p — l) 2 . For example, for p = 100 our matrix 
A would be of size 10000 x 10000 with only 109,002 non-zero elements. Matrices of such 
sizes can well be handled by modern matrix exponentiation methods (see below) J^j 

5.1 Transient Probabilities of QBD Chain by Randomization 

It is well known that a direct computation of a matrix exponential e tA with a Markov 
generator A via a straightforward use of a Taylor series expansion as X]^=o(^) n / n ' ^ s ^ n 
general not a good idea (see Moler & van Loan (2003)). The main reason for this is that 



severe roundoff errors might accumulate (especially when the matrix is large) due to the 
fact that the generator has both positive and negative entries. In addition, matrices (tA) n 
become nonsparse even if the original matrix A is sparse, as is the case for the QBD process. 
An efficient method of choice for dealing with matrix exponentials for large matrices is 



known as Jensen's randomization; see, e.g., Gross & Miller (1984), or Haverkort (2001) for 



a more recent review. The method proceeds as follows. We start with choosing a parameter 
A > maxj { | An | } , and define the matrix 



1 + 



A 



=>A = \(P-J) 



(49) 



18 Here in addition to various efficient algorithms for computing a matrix exponential of a sparse matrix, 
one could use splitting in different dimensions that would take into account a block-diagonal form of the 
generator matrix A. 

19 While a randomization method that we describe in the following section may not be the most efficient 



method when the L2 norm of matrix A is large ( Sidje & Stewart ( 1999 )), it is very convenient for introducing 
stochastic volatility via a stochastic time change. We therefore stick to this approach in what follows. 
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With our choice of A, all entries of P are between and 1, and all rows sum to 1. This 
means that P is a stochastic matrix that describes a discrete-time Markov chain "related" 
to the original continuous-time Markov chain with generator A. We now want to discuss in 
more detail the sense in which these two Markov chains are "related." 



To this end, we substitute A as given by Eq. (|49|) into the solution of the forward equation: 

P(t) 



P>e tA 



pi At(P-I) 

_r e 



P' Q e 



Using a Taylor series expansion for the matrix exponential in this expression, we obtain 

Pit) = P^- Xt E -^ir- = p ^ ^ n)p "' (50) 



n=0 



where 



ip(Xt, n) 



n=0 



n e N, 



nl 



are Poisson probabilities, i.e., probabilities of observing n events by time t for a Poisson 
process with intensity A. Note that a naive Taylor expansion of the matrix exponential 
e tA behaves badly, but the new expansion is much better behaved: roundoff errors are now 
largely eliminated as all entries of matrix P are between and 1. Moreover, different terms 
are weighted by the Poisson probabilities, so that the expansion is expected to converge fast 
when the product \t is not too large. 



We note that the construction given by Eq.(50) can be interpreted as a discrete-time 
Markov chain (DTMC) Y n (n = 0,1,..., p) with transition matrix P subordinated to a 
Poisson process N t where the latter serves as a randomized "operational time" for Y n , so 
that the subordinated process is now defined as X t = Y/v t (see, e.g., Feller (1968)). We will 



return to the topic of subordinated processes in Sect. 7.1 



It is important to point out that the method just presented can be used without a 
matrix-matrix multiplication (as Eq.(50) would naively suggest). Let if n be the probability 



distribution vector in the DTMC with transition matrix P after n epochs. This vector can 
be computed recursively: 



7T0 = Po, 

7T n = 7T n _ 1 P, n G N + . 



(51) 



Using the vector fc n , the state probabilities Eq.(50) in the original CTMC are computed as 
follows: 

oo oo 

Pit) = ]>>(At,n) (P^P n ) = 5>(A*,n)7r n . 

n=0 n=0 

Therefore, computationally the algorithm amounts to a series of vector-matrix multiplica- 
tions that can be done very efficiently for matrices of sizes typical for our problem. Moreover, 



the recursive procedure of Eq.(51) preserves the sparsity of P, thus enabling a substantial 



acceleration of the vector-matrix multiplication. 



In practice, the infinite sum in Eq.(50) is truncated at some value n max . This value can 



be adaptively controlled within the algorithm itself, as a theoretical upper bound for an error 
resulting from the truncation is available as discussed, e.g., by Gross and Miller (1984). 
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5.2 Calibration of Speed Factors in USLV(2,0) 



Summarizing our results for the USLV(2,0) model so far, we see that the mathematical 
structure of the (2,0) model is similar to that of the (1,0) model. Indeed, for the latter our 
Markov chain construction gives rise to a nonlinear birth-death (BD) process modulated by 
ID speed factors (SFs) Sj(t). After a proper parametrization as described in Sect. 4.2, SFs are 
calibrated to market option quotes. For the (2,0) case (two factors for the term structure), 
the resulting Markov chain is a nonlinear quasi-birth- death process, again modulated by a 
set of SFs Sij. In terms of computational complexity, the two cases are essentially the same, 
as both amount to calculation of matrix exponentials of a Markov chain generator (albeit in 
different dimensions). 

Calibration of the (2,0) model is done by the fitting function s(z,t) in Eq.(37). To this 
end, we proceed similarly to the one-factor case. We assume that function s(z, t) = s(zi, z%, t) 
is a function of single argument, s(zi,z 2 ,t) = s(a,\Zi + otiz-z, t), where a±,a 2 > are some 
weights (e.g., a.\ = a 2 = 0.5). For calibration purposes, we could parametrize this function 
in a piecewise-linear way, i.e., in exactly the same way as we did before for the N = 1 model. 
The number of free parameters (anchor points) and their locations would be chosen based 
on a particular set of instruments available for calibration. 

To continue with our theoretical construction of the model, in what follows we assume 
that the stage of construction of a (2,0) (or (1,0)) version of the USLV model is completed 
along the lines described here. In what follows, we refer to these SFs as ID SFs, in order to 
differentiate them from another set of speed factors (2D SFs) that will be introduced below 
when we add stochastic volatility to the model. 

Finally, we note that while the main purpose of the USLV(2,0) model for our purposes 
is to use it as a building block in the construction of a full-blown USLV(2,2) model with 
stochastic volatility, the pure local volatility USLV(2,0) model can also be useful in its own 
right, e.g., as a way of pricing European vanilla options with illiquid strikes in terms of prices 
of liquidly traded options. 



6 USLV(2,2): Two Curves and Two Volatility Factors 

Once we have a calibrated USLV(2,0) model, introduction of stochastic volatility in this 
framework amounts to two things: (i) introducing new dynamics for volatility drivers, and (ii) 
making sure the model still calibrates to available option prices. This produces a calibrated 
USLV(2,2) model. 

Let us note that stochastic volatility dynamics can be introduced in our framework in two 
ways. In the first approach, we follow the formulation of a continuous-time Markov chain 
(CTMC) dynamics, which we now augment by 2D dynamics of "spot" variance factors. For 
numerical implementation, the model is then put on a time grid [to,ti, . . . , t n ] with a uniform 
time step At. All calculations (see below) are done to 0(At 2 ) accuracy, which assumes that 
At should be sufficiently smallj^] With this method, we solve the forward and backward 

20 For example, we might need to use daily or more frequent steps, depending on the level of volatility, 
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Kolmogorov equations one step At at a time, similarly to finite difference methods. 

In the second approach, we deal with arbitrary time lines which do not necessarily have 
small time steps. For example, we may want to model the values of underlying factors only 
on a sparse set of "interesting" dates (e.g., coupon dates, call dates etc.). Essentially, by 
taking matrix exponentials of the generator, we aim here to achieve a functionality similar 
to the USLV(1,0) and USLV(2,0) models (or any CTMC model, for that matter), which are 
capable of computing finite-time transition probabilities directlyp] 

Respectively, in what follows we present two versions of the USLV(2,2) model. In the 

first version, we assume a Markov dynamics in the pair (Z t ,Y t ) where Y t = [Y^jY} 2 ^ 

is a bivariate "spot" variance driver. In the second version, we instead assume a Markov 
dynamics in a pair of Z t and an integrated bivariate variance, or, more generally, a bivariate 

stochastic time subordinator (see below). We will use the notation T t = ,T t (2) ) for 

the latter in what follows. For reasons that will become clear shortly, we will refer to these 
two versions of the model as the activity-rate-based model (AR-USLV), or the implied-time- 
change-b&sed model (ITC-USLV), respectively. 

On the theoretical side, it turns out that both approaches can be viewed in a unified way 
by interpreting them as particular realizations of a stochastic time change of the original 



QBD Markov chain. We will give more details on this below in Sect. 7.1 

On the practical side, we can choose between two numerical methods. With the first 
method, we can implement both the AR- and ITC-versions of our model in a similar way 
using a version of the Markovian projection method. The latter reduces calibration of 
USLV(2,2) to a fast forward induction method in what is essentially a ID problem, without 
a need for a forward induction on a full 2D Markov chainH No new optimization in addition 
to one performed at the stage of calibration of the USLV(2,0) model is involved here. There- 
fore, the method is very fast on each given time step, the only potential bottleneck being 
the necessity to perform such computation repeatedly on a dense time grid. The method is 
nonparametric in that it solves the problem of calibration of the full-blown USLV(2,2) model 
via a judicious choice of 2D speed factors (SFs) that are computed off the calibrated ID SFs 
of the USLV(2,0) model. 

The second method, which is applied below for the ITC-USLV version of our model but in 
principle could be used for both versions, is to "break the symmetry," and make the process 

T t = {T^jT^j parametric in one dimension (e.g., T^), while keeping it nonparametric in 
another dimension (resp., T^). The idea here is that for the purpose of calculation of finite- 



with this approach. 

21 To the extent that one-step methods, such as the Runge-Kutta method, can be viewed as particular ways 



to compute matrix exponentials (see Moler & van Loan (2003)), what we mean here by "direct" calculations 



are other methods of computing matrix exponentials that might in some cases be more efficient than one-step 
methods. 



( 



22 Recall that by fD and 2D, we mean linearized spaces obtained from the pairs (z^~\ zj- 2 j an d 
Y^', Y^) by taking elements of pairwise Kroneker products as new ID bases. In terms of factor counting, 



t ' t 

our ID and 2D Markov chains correspond to the two- and four-factor model specifications, respectively. 
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time transition probabilities in the Z-space, we can perform averaging over the randomness 
due to analytically (or semi-analytically) once a tractable model for subordinator is 
specified. The averaging over the residual randomness due to TJ. is performed numerically. 
Similarly to the previous case, this calculation can be done in a nonparametric setting, where 
at each step on our sparse time grid, we introduce just enough free parameters to match 
observed quotes for options maturing at this time. Differently from the previous case, the 
recalibration to option quotes in the present setting amounts to a (convex) optimization 
problem in the dimension equal to the number of option quotes for this maturity. 

The two flavors (AR and ITC) of our USLV(2,2) model outlined above thus offer a certain 
trade-off in terms of complexity. For the AR-USLV(2,2) model, the recalibration is fast for 
one step, but complexity scales linearly with the number of time steps on a dense grid; 
i.e., the complexity is 0(N d ). For the ITC-USLV(2,2) model, the complexity is 0(N s N c ), 
where N s is the number of nodes on a sparse time line, and N c is the number of option quotes 
per node, independently of At, but the 0(N C ) part above involves convex optimization in 
dimension N c . Based on previous experience with similar models, we expect a compatible 
performance from the two versions of the USLV(2,2) model, at least for typical cases (e.g., 
iV c = 5, N t = 40). Therefore, in what follows we will present both versions of the model. 

Our plan for the reminder of this paper is as follows. In the rest of this section, we describe 
the AR-USLV(2,2) version of the model, where the Mar kov pair is (Z t , Y t ), with a bivariate 

spot variance driver Y t = [Y^jY^j . In Sect. 6.1, we provide a qualitative overview 

of this version of the model. The following subsections of Sect. [6] provide details of our 
approach. The ITC-USLV(2,2) version of the model, where the Markov pair is (Z t , T t ) with 

T t = ^t/^jT^ 2 ^ being a bivariate subordinator, is presented in Sect. |?| As will be shown 
below, calibration to observed option prices amounts, in this approach, to a construction of 
an implied time change (ITC) process. Within a particular approach presented in Sect. [7| 
the ITC process is defined in terms of a bivariate exponential-Levy process h t = (T t , 9 t ) 
where T t is a parametric subordinator (e.g., an exponential gamma process), and 9 t is a 
nonparametric subordinator. The latter will be referred to as a time dilaton process, for 
reasons explained below. 

6.1 Overview of AR-USLV(2,2) 

As was mentioned above, a model obtained from our USLV(2,0) model by adding new state 
variables (in this case, spot volatility drivers Y t = {Y^, Y^ 2 ^) would not in general match 
observed option prices, even if our initial USLV(2,0) model does. Moreover, for any particular 
parametric model for the dynamics of the pair (Yp^, Y t ^j ? we are still not guaranteed that 
the full model could accurately fit available option quotes even after calibration of parameters 
of the F-process. 

In order to reinforce a nearly exact calibration to options for all consistent sets of quotes, 
we introduce 2D speed factors (SFs) S(z, y, t) in the full (2,2) model, that play an analogous 



role to ID SFs Eq.(36) in the (2,0) version of the model. We then provide a fast scheme to 
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compute 2D SFs based on solving the forward equation on the Markov chain. Our method is 



similar to one used by Britten- Jones & Neuberger (2000) (BJN), see also Rossi (2002), for a 



lattice-based stochastic local equity model in a (1+1) setting. A similar method was used in 



the BSLP model by Arnsdorf & Halperin (2007) for modeling dynamics of credit portfolios 



For a similar method used for equity option pricing, see Ren et al. (2007). 

A peculiar feature of a BJN-like forward induction method (to be presented in detail 
below) is that it tries to adjust the Z-process for any Y-process. It does not address the 
problem of calibration of parameters of the Y-process itself. In certain situations, it might 
make sense to try to calibrate parameters of the Y-process as accurately as we can before 
adding a local volatility layer (so that our change to a parametric model due to introduction 
of a local volatility would be a minimal tweak of the model). Alternatively, we could try to 
fit parameters of the Y-process after we introduce the local volatility layers, but before we 
compute the 2D SFs. This might be an attractive option for a practical method of model 
calibration in our setting. The reason is that if such parametric calibration of the Y-process 
produces a good but not perfect fit to the data, the role of non-parametric 2D SFs of the 
(N = 2, M = 2) model would be to perfect quality of calibration at the price of adding some 
nonparametricityp^l 

Note that while the BJN-like approach does not by itself address the problem of cali- 
bration of parameters of the Y-process for a parametric specification of the dynamics of Y t , 
this is where we could use Laplace transform based methods for stochastic subordinators, 
similar to the method presented in Sect. [7] in a slightly different setting. This implies that 
the calibration method presented later in this section and a method presented in Sect. [7] 
can in practice be used together for a joint parametric/nonparametric calibration of the 
AR-USLV(2,2) model. 

Yet another possible way to calibrate our model would be as follows. If a trader has 
a strong view on the relative weights of a spanned (delta-driven) and unspanned (genuine 
vega) contribution to options' vegas, and wants the model to behave accordingly, this could 
be achieved as follows. Assuming that we are able to map constraints like those onto some 
typical behavior of the set of SFsj^] we first fix some set of ID SFs, and then calibrate 
parameters of the Y-process given these SFs. After parameters of the Y-process are specified 
in this way, we proceed in the regular way of calibrating the model, by first computing the 
"true" (market-implied) set of ID SFs, and then following the forward calibration of 2D SFs 
in the full-blown model with parameters of the Y-process just computed at the previous 
step. Again, a combination of various methods presented below can be used to implement 
such a calibration strategy. 

As a brief summary, our QBD Markov chain stochastic-local volatility offers substantial 
flexibility in how the model can be calibrated to available market and/or historical data. Dif- 
ferent steps/versions of the calibration procedure can be combined (or skipped), depending 
on the specific needs of an end user. We now proceed with describing our framework. 



23 We hold a view that nonparametricity is "evil," but it is a "common evil" in the sense that it is used 
everywhere (for term structure calibration, local volatility models etc.). 

24 Such dependence can be established either theoretically, or empirically on the basis of behavior of the 
model as a function of model parameters. 
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6.2 QBD Processes and Stochastic Time Changes 



We prefer to think of stochastic volatility in terms of a stochastic time change of some "base" 
process such as a Brownian motion. (See Sect. [7] for more details and relevant references.) 
As our original two-factor (N = 2) diffusion equation, Eq.(37), has two Brownian drivers, 



W t {1) and W t {2) , we can use two different stochastic clocks on them. This would amount to 
having a stochastic local volatility model with N = 2 and M = 2. 

Such formulation can be useful for asset classes where the short- and long-term option 
volatilities typically behave differently (e.g., have different typical levels or vol-of-vol), in 
addition to a different behavior of Z r factors driving short- and long-term prices of basic 
instruments (bonds, futures etc.). For example, for modeling commodity derivatives, we 



might want to have one long-term factor driven by a Brownian driver W^' with its own 



r(l) 



stochastic clock (stochastic volatility) driver Y t , and another, short-term factor Z\ A} driven 
by another (possibly correlated) Brownian driver W{ , with its own stochastic clock driver 
Y^j^] This results in a four-factor scenario with correlated long- and short-term factors, 
each having its own stochastic volatility driver. 

The above picture of two curve drivers each having its own stochastic clock is not lost 
in our discrete-space Markov chain construction. As we will show next, the structure of our 
QBD Markov chain Eq.(45) for the N = 2 case enables introducing two stochastic clocks in 



,(2) 



the model in an internally consistent way, and without any need of introducing additional ad 
hoc constraints on the model dynamics. These stochastic clocks will modulate two Markov 
chain generators. As the latter play the role of stochastic drivers in the discrete-space 
setting, the resulting "ecosystem" of (discrete-valued) curve and volatility factors bears a 
strong structural similarity to its continuous-space counterpart. 



To explain our construction, we start with representing the Markov generator Eq.(45) in 
the following form: 
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(52) 



where F® = diag + diag (B®l) and = L {i) + F {i \ with 1 being a vector of 

ones. Using Eq.((4T|) and Eq.(46), it can be readily checked that both A\ and A% defined in 



Eq.(52) are valid generators in the sense that for both, all off-diagonal elements are positive, 



all diagonal elements are negative and all rows sum up to zero. 

This can be interpreted as follows. The second generator A 2 corresponds to an idiosyn- 
cratic component of the Z-dynamics that is independent of the rest of the system, and can be 



25 Alternatively, correlated dynamics of two stochastic drivers with each one having its own stochastic 
volatility factor can be used for pricing hybrid derivatives. 
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(2) 

thought of as describing idiosyncratic jumps of Zf' that occur without a simultaneous jump 
of Zf on the same time interval, 26 In terms of representation of stochastic dynamics of 

— (2) J 

our system, this generator is an avatar of the idiosyncratic Brownian driver in Eq.(37). 
The first generator A± describes joint jumps of Z t and nf \ and can be thought of as an 
avatar of the common Brownian driver W t in Eq.(37). 

As shown in Appendix [Bj a random time change of a continuous-time Markov chain 
amounts, in terms of the resulting Markov generator for the chain, to scaling all elements of 
the original Markov chain generator by a common factor given by the value of the activity 
rate (intensity of the time change) process. 

As they conserve probability separately from each other, two generators A\ and A 2 can be 
seen as representing two different subsystems of our dynamic system in the {Z^\ Z^^-space. 
As a time change acts as a common scaling factor on a generator matrix (see Appendix 
B), this implies that we can subject two generators, A\ and A2, to different stochastic 
clock changes without any problems with laws of probability: after separate time changes, 
probabilities are still all nonnegative, and sum up to one in each subsystem separately. This 
gives rise to a discrete-space version of a continuous M = 2 stochastic volatility model. 

To summarize, the (2+2)-factor structure of the original continuous-space system Eq.(37) 
(i.e. two factors for the term structure and two factors for the volatility) is now naturally 
mapped onto a corresponding structure in our Markov chain model, where a QBD process 

is an avatar of a two-dimensional Brownian motion (wp , W} 2 ^\ and two volatility factors 

are separately introduced as two stochastic clocks for two generators A\ and A 2 as explained 
above. We now discuss specific realizations of this scenario in our model. 



6.3 Forward Equation and Transition Probabilities in USLV(2,2) 

We assume discrete dynamics of stochastic intensity (stochastic volatility) drivers Y t , with 
an odd number N y = 2q + 1 of discrete states for each driver y} 1 \y} 2 \ Points on a two- 

dimensional F-grid are denoted as < F a \ , Ya 2 > . The initial values (Y t= Q, Y t= o) Carre- 

ls J Ql,cr2=0 

spond to the midpoints (Yg , Yq 2 ^) on the grid. 

In practice, we prefer to keep a low number of states (say 3 to 11) per volatility factor. 
As volatility is unobservable, we feel that maintaining a low number of states might suffice 
to reproduce most important stylized facts about stochastic volatility such as mean reversion 
and/or volatility clustering (persistence), alongside its role in providing a better behavior of a 
forward smile (a non-flattening smile for longer maturities) than typical behavioral patterns 
observed with local volatility models. 

To ease the notation, in this section we use Latin indices i,j, k to enumerate states 
(Z( = Zj, Z t+ dt = Zj etc.), and Greek indices a, (3 to enumerate values of Y t , Y t +dt- However, 
because we deal with a two-factor setting, both the indices and factor values are now two- 

26 This can also be viewed as a one-dimensional orthogonal projection of two-dimensional dynamics of the 
pair (Z^\ z[ 2 ^ ) onto a subspace where no jumps in variable X — are allowed. 
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component vectors rather than scalars; for example, 



Z i = (ZW,Z®) , i = (h,i 2 ), h,i 2 e Z+ (53) 



A similar representation is used for volatility drivers Y a = (Yal\Ya^ . In what follows we 

use both the vectorized and component notations. 

We postulate that the 2D dynamics of the pair (Z t ,Y t ) (where both factors Z t and Y t 
are two-dimensional) in the USLV model is Markovian. The system is defined in terms of 
the joint marginal probabilities 

7i(j,a,t) = P[Z t = Z j ,Y t = Y a ] 

and conditional transition probabilities 

Pia\jp{t, t + dt) = P [Z t+dt = Zj, Y t+dt = Y p\Z t = Zj, Yf = Y a ] . 

The forward equation takes the form 

ir(j,/3,t + dt) = t + dt)7r(i, a, t). (54) 

3, a 

The transition probabilities have the following expansion: 

Pia\jp(t, t + dt) = 5ij5 al5 + A ia \ jl3 (t)dt + O (dt 2 ) , (55) 

where A^jpit) is the Markov generator, and Sij = 8^8^ is the 2D Kroneker symbol (with 
a similar definition for 8 a p). 

To proceed, we introduce the following conditional probabilities: 

pj?\t,t + dt) = P[Z t+dt = Z i |Z t = Z i ,Y t = Y«], (56) 
Pap ' t + dt) = P [Y t +dt = Y^Yf = Y Q , Z t = Zj, Z t+ dt = Zj] . 

The joint probability Pi a \jp can now be written as follows: 

p ia \jp(t, t + dt) = P^\t, t + dt)P$(t, t + dt). (57) 



Using Eq.(55), we obtain 



P t { f (t, t + dt) = p ia \jp(t, t + dt)= % + A[f (t)dt + O (dt 2 ) , (58) 

P 

where 

A%\t) = J2 ^(t) , Yl 4 a) (*) = ( 59 ) 

P 3 

is the conditional generator of the Z-Markov chain. 
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It is convenient to write the second conditional probability in Eq.(56) in the following 
form: 



P%\t,t + dt) = 5 a p + Q%(t)dt, = 0, 



p(i,i+m) 



(60) 



P 

(t,t + dt) = 5 aB + Q { ™\t), £0*) = °' (m 1; m 2 )^(0,0). 

P 

Note that the term (t) in the second equation is not multiplied by dt because the second 



relation in Eq.(56) is not a transition probability but rather a conditional probability where 
we condition, in particular, on Zt+dt- If dZ t ^ 0, then dt cancels out in the calculation of 
the conditional probability. This means that Q^g(t) is not a real generator, but rather a 
"pseudo generator" introduced here to simplify formulae to follow. In its turn, this means 
that diagonal elements of Q£a if) cannot be made arbitrarily negative, as otherwise we would 
end up with probabilities reaching outside of the unit interval [0, 1]. 

Now we plug Eq.(55) and Eq.(58) into Eq.(57). This produces the following relation 
(here we omit O (dt 2 ) terms): 



+ A ia \ jB {t)dt = P®\t, t + dt) 8ij + A\J'(t)dt 



(61) 



A more explicit expression for the generator Ai a yB in terms of auxiliary generators A[j\ Q^b 
and <5a/3 can t> e obtained using the following identity (which can be checked by inspection): 



.4 



ia\jB 



v l — 5ij)Ai a \jB + (1 — 0~aB)Ai a \jB ~ (1 — <%)(! — S a 3)A ia \jB + SijS a BA ia \ ia . 



(62) 



Using Eq.(pip to evaluate different terms in the right-hand side of Eq.(62) in terms of the 



auxiliary generators, we obtain, after some algebra, the following general representation of 
generator A ia \ jB [t) of USLV(2,2): 



.4 



ia\jP 



w3> + (i 



3ij)QaB '-ii 



A-' + 5 



(63) 



Different terms in this expression are interpreted as follows, 27 

The first term SyQ^g is a generator of idiosyncratic jumps of Y t that proceed without 
a simultaneous jump of Z t in the interval [t,t + dt}. Various continuous-space models can 
be used as a means to parametrize this generator via discretization of the state space. For 
example, starting with a diffusive model for Y t , we end up with a tr idiagonal generator 
matrix Q^s- More details and examples will be given below in Sect. 



6.6 



The second term in Eq.(63) is a generator of joint jumps of (Z t , Y t ). Note that it is a 
valid generator on its own as long as YIb Qc*p = ®- Again, different specifications of this 



jB ^aB 

generator can be considered within our general framework. This will be discussed in some 
detail below in Sect. I6.TL 



We thank Leonid Malyshkin for proposing a decomposition of the Markov generator in such form, as 
well as for discussions that helped improve the presentation in this section. 
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Finally, the last term in Eq.(63) is a generator of idiosyncratic jumps of Z t that proceed 

It is determined by the conditional Markov chain 

It is special 



without a simultaneous jump of Y t . 

This generator plays a special role in our construction 



generator A^)(t). 



because the conditional Markov chain generator A^)(t) is the only generator in Eq.(63) that 
impacts prices of European vanilla options, while prices of exotic options will in general 



depend on all generators that enter Eq.(63). As will be explained in more detail below, this 
is due to the following relations that follow as long as Yip Q^p — and Yip Q^p = ; 



^,Pia\3f)(t) = 6 tj + A^\t)dt. 



(64) 



Note that the fact that theoretical prices of vanilla options computed in USLV(2,2) do not 
depend on specification of the other generators Q and Q in Eq.(63) has a few interesting 
implications. 

First, it suggests a nice "orthogonality" property of model parameters determining var- 



ious generators that enter Eq.(63), such that parameters driving prices of exotic options 



can be tuned (or picked) without impacting calibration to vanillas. If prices of some exotic 
options are available in the marketplace, this can be used to calibrate these two generators, 
after the model is calibrated to available vanillas. 

Second, in a scenario where no reliable pricing information is available for exotic options, 
we could use this property of the model in order to specify a measure of "exoticness" as, 
e.g., the amount the price of the given exotic derivative moves under certain functional or 
parametric tweaks of the first two generators in Eq. (|63| ). Given two exotic options and given 
tweaks to be performed on the generators in Eq.(63) (such as a common rescaling of all 



elements) while pricing both options, one of the options from the pair would in general end 
up being "more exotic" than the other. While these issues will be addressed in a future 
work, here we concentrate on the problem of calibrating the model to European vanilla 
option prices. 



6.4 Conditional Markov Chain Generator 

Clearly, prices of European vanilla options on a given underlying Z t for a set of options 
maturing at times T\,Tz, . . . are only determined by marginal distributions of Z t at these 
times. An equation driving evolution of marginal Z-distributions in the full USLV(2,2) model 
can be obtained by summing over (3 = (/3 1; (3 2 ) in the forward equation Eq.(54). We obtain 

ir(j,t + dt) = y^y^Piawit, t + dt)ir(i, a, t) = ^ + A\f(t)dtj n(i,a,t), (65) 

P i,a i,a 



where we used Eq.(64) for the last equality. This justifies the claim we made above: observed 
prices of European vanilla options impose certain constraints on the conditional Markov chain 
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generator A^\ but not on the other generators appearing in Eq.(|63|). Rearranging terms in 



Eq.(65), we obtain 



dt 



(66) 



An explicit expression for the conditional Markov chain generator A\"\t) can be obtained 



using Eq.(57): 



A 



(a) 



dt 



[P [Z t+dt = Zj\Z t = Z h Y t = Y a ] - <y + O (dt 2 ) . 



(67) 



From this point onwards, we reserve the notation for a calibrated generator of USLV(2,2) 



while using the notation A^ for an initial guess for A\J' . The latter is assumed to be a 
valid generator (obtained from some consistent model) that is not necessarily accurately 
calibrated to the observed market. In what follows, we will refer to the latter as a prior 
conditional Markov chain generator. While a particular functional relation between the two 
generators Affi and will be considered in the next section, in the reminder of this section 
we concentrate on specifying the second, "prior" generator A["\ 

As was outlined above (see also Appendix B), we define A^ as a combination of gener- 



ic 



ators A\ and A2 (see Eq.(52)), scaled by two components of Y t = ( Y t ^\ 



Af = Y^A l + Y^A 2 . 



(61 



Recalling the original definition Eq.(45) of the Markov chain generator, we can write this in 
a matrix form: 
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(69) 



where all matrices By , Fy are obtained by scaling of B^ l \ F^ by Ya^ : 



(i) 



B 



Y 



vWrW p 



(0 



while elements of Ly^ are scaled by Y^ , except for the diagonal elements: 



■(<) 



,(2) . (i) 
"2 



Y 



(2) 



02 



if k 

Y^F^ ilk 



3, 



(70) 



(71) 



where is defined in Eq.(52). Using Eq.(41) in Eq.(71), we obtain the explicit expression 

vi 2)£ #(4f;!feg) 2 if*=j±i 

-Ya 2 ^Sij(t) - ^ A 2 z l) - Y^Sij{t)^ if k 
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Clearly, conditional on values Yal > 0, diagonal elements of the conditional Markov 
chain generator Eq.(69) given by the second line of Eq.(71) are negative (as long as Eq.(42) 
holds), while all off-diagonal elements are positive, and the row-wise sums of elements in 
Eq.(69) are all zeros. Therefore, Eq.(69) is a valid conditional generator for any fixed val- 
ues of Yal\Yaa^ > 0. The first component Y^ modulates transitions between Z^-states 
(which may or may not be accompanied by transitions between Z^-states), while the second 
component Y^ modulates transitions between Z^^-states without simultaneous transitions 
between Z^-states. 



6.5 Fast Calibration of USLV(2,2) by ID Forward Induction 

In this section, we present a fast calibration algorithm that enables a recalibration of the 
full 2D USLV(2,2) model starting from a calibrated ID USLV(2,0) without using a forward 
induction on a full-blown 2D Markov chain. It uses a recursive procedure of "integrating in" 
the stochastic volatility process. 



Our method is similar to Arnsdorf & Halperin (2007). In 



its turn, a fast calibration method on a 2D Markov chain used in Arnsdorf & Halperin (|2007|) 
is similar to an algorithm originally developed by Britten- Jones and Neuberger (BJN 

Recalling our previous notation where we used symbols A and A for the calibrated and 
"prior" conditional Markov chain generator, we assume the following relation between them 



(t) 



[l-S^q^Y^A^it) 



8 



m^O 



(Y t ,t)A$(t). 



(73) 



Here qij(Y t , t) > are adjustment factors that will be used below to calibrate the full-blown 
USLV(2,2) modelp] Note that Eq.(73) defines a valid generator as long as 5y(Y t ,t) > 



and Aj£l(t) is a valid generator, as all nondiagonal elements of A K °j\t) are non-negative, all 
diagonal elements are negative, and all rows sum up to zero. 

The purpose of introducing the adjustment factors qij(Y t ,t) in Eq.(73) is to provide de- 
grees of freedom needed for calibration to option prices in the (2,2) model in a way similar to 
the way the ID speed factors were used above to calibrate the (2,0) model without stochastic 
volatility. As will be shown below, such calibration can be done in a numerically efficient way 
by reutilizing results of a previous calibration in a local volatility USLV(2,0) model. Note 
that after calibration of USLV(2,2) is done via a choice of multiplicative adjustment factors 
qij(Y t ,t), the latter can be combined with the ID SFs Sy(i) that appear in the "prior" 
generator A^(t) to produce 2D SFs Sij(Y t ,t). 



(«)/ 



This approach was later popularized bylPiterbarg (2006) under the name "Markovian projection." Note 



that both BJN and Peterbarg cite the work by Dupire on the link between stochastic and local volatility 
models. Dupire's approach seems to provide a common basis for both the BJN and Markov projection 
methods. 

29 The theoretical interpretation of adjustment factors qij(Y t ,t) is that they provide "risk-neutralizing" 
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To proceed, we first plug Eq.(73) into Eq.(66). This yields 
dir(j,t) 



dt 



]T [g«(Y t , t)A<g\t)ir(i, a, t) - Qji (Y t , t)A^(t)ir(j, a, t) 



(74) 



This can be compared to the forward equation obtained in the USLV(2,0) model where we 
have the following definition of the generator: 



p i:j (t, t + dt) = 6ij + Aijdt + O (dt 2 ) 
while the forward equation has the form 

dir(j,t) 



dt 



J2lAAtMht)-Mt)ir(j,t)}. 



(75) 



(76) 



Comparing Eq.(74) and Eq.(76), we find that marginal distributions ir(j,t + dt) in both the 
USLV(2,2) and USLV(2,0) models are matched at each node j = (ji, j%) provided we make 
ce for adji 

?«(Y t ,t) 



the following choice for adjustment factors qij(t) in Eq.(73): 

Aij(t)ir(i,t) Aj(t)J2 a n(i,a,t) 



Ea4 0) (*Mi,«,*) Z a A%>(t)ir(i,a,t) 



(a), 



(77) 



We now use our key result Eq.([77j) to set up a convenient and fast forward-induction method 
for the calibration of USLV(2,2) that utilizes the results of the ID calibration of the Z t - 
Markov chain of the USLV(2,0) model with a calibrated generator Aij(t), starting with a 
"prior" conditional generator A^\t) of the USLV(2,2) model. 

We assume that the ID calibration of the Z r Markov chain is performed as discussed 
above. We start with the initial conditions for the 2D and ID probability distributions, 
correspondingly, 

7r(z, a, 0) = 8#8a& , tt(z,0) = % (78) 

where % = (11,12) and a = (di,a 2 ) are indices corresponding to the initial values of Z and 

o,0): 



Yq (which we assume to be known), respectively. Using Eq.(77), we solve for fefY, 



%(Y 0) 0) 



^,■(0) 



A (6) (0) 



(79) 



Note that for i ^ i, the correction factors at time t = are undefined. However, this does 
not pose any problem as such states are unachievable at time t = 0, and therefore play no 
role in the dynamics. If desired, these parameters can be assigned some dummy values that 
would not have any impact on any numerical results produced with the model. 

Next we use the forward equation on interval [0, dt] to compute the joint probability 
7r(j, (3, dt), which is then used to compute the adjustments for all nodes at time t = dt, and 
so on. As a result, we have a full 2D USLV(2,2) Markov chain calibrated to the set of option 
quotes using a fast and effective algorithm. 
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6.6 Generator of Idiosyncratic Dynamics of Y t 

In this section, we provide some examples of specification of the first generator, dijQ^L, 



in Eq.(63). We recall that by construction of the model, the choice of this generator in 
USLV(2,2) has no impact on the quality of calibration of the model to prices of European 
vanilla options in a calibration set, while in general it does impact prices of exotic options 
produced by the model. 

As was mentioned above, for practical applications we typically have in mind a low (e.g., 
3 to 11) number of possible states per dimension of the stochastic volatility factor. This has 
two implications. 

First, we prefer to view continuous-space models as a convenient and compact way to 
rametrize the generator S^Q^p i n Eq.( 
moves of Y t without simultaneous moves 



of 



parametrize the generator SijQ^fl * n Eq.(63). This generator corresponds to idiosyncratic 

~7if Such parametrization is clearly preferred to 



directly specifying ~ 2(2g + l) 2 free parameters defining a discrete-state generator SijQ^. 

Second, as long as the number of volatility states is low, the generator matrix for Y t 
should not necessarily be sparse. This remark is important as nonsparse matrices arise while 
discretizing jump-diffusion processes. 

For simplicity, in this paper we restrict ourselves to a particular bivariate mean-reverting 
diffusion process as a continuous-space model that produces generator Q^l after a proper 



discretization, 30 More specifically, we consider a bivariate Ornstein-Uhlenbeck (OU) process 

,(*) _ i nrr yW : 



for the logarithmic variables y\ = log Y t 

dy (2) = k 2 (rj 2 - yf ] ) dt + u 2 dW t {2 \ (80) 



where the two Brownian motions and are correlated with correlation p y . 



The continuous-space Markov generator corresponding to Eq.(80) reads 



rv(\ h ' ^(y) ■ h < ^(y) , i 2 d 2 viy) , i 2 d 2 v(y) , gMy) 
C y V(y) = 6i(y)-^~ + ^(y)-^- + 2 Vl -djT + 2^^yT + ^Whjz' m 

where 

b l (y) = hU-y?) , i = 1, 2. (82) 



Note that parameters of generator C y can be made dependent on the value of Z t if desired. 

The F-generator can be discretized in a similar way to a procedure used above in Sect. |5j 
We use central differences for the second derivatives and a noncentral difference for the mixed 



derivative in Eq.(81). In addition, we use the upwind-difference discretization for the first 



derivatives in Eq.(81). Regrouping different terms in the discretized generator much as it 



was done above in Sect. [5j the resulting discrete Markov chain generator for Y t can be cast 

30 If any other model of stochastic volatility is chosen instead of the one presented below, the only change 
needed in the present framework would be to construct different transition matrices (or generators) for the 
Y-states, while the computational part of calibration and pricing would stay the same. 
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in the form of another QBD process, in an analogous way to our construction of the QBD 
process for Z t . The resulting generator takes the familiar QBD form (compare, e.g., with 



Eq.(69)): 
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where all matrices B^\L^\ F^> have dimension (2g + 1) x (2g + 1), i.e., the dimension of 
our one-dimensional grids. 

We would like to conclude this section with a few remarks. Using a diffusive prototype 
is certainly not the only way of constructing the idiosyncratic generator in Eq.(63). 
Alternatively, we could consider more general jump- diffusion or Levy processes as continuous- 
space prototypes of the generator. For such more general specifications, generator Q^\t) 
would not in general be sparse. Note, however, that a potential nonsparsity of the generator 
in the latter case would not be a major concern if we have a small number of states for Y t . 

Furthermore, as was mentioned above, a BJN-like fast ID calibration procedure for AR- 



USLV(2,2) presented below in Sect. 6.5 can also be applied when instead of spot variance 
factors Y t , we use integrated variance drivers/subordinators T t . The only difference from the 
case of spot variance factors just presented would be that generators for subordinators should 
have all zeros below the diagonals (as a subordinator should be a non-decreasing function of 
time). However, both the calibration and pricing algorithms would stay the same. 



6.7 Generator of Joint Jump Dynamics 



Here we consider the second term (1 — Sij)Q^g A\"' in Eq.(|63|). We recall that this expression 



defines the generator of joint jumps whose matrix elements give intensities of simultaneous 
jumps of Z t and Y t . 

The motivation for introducing joint jumps of Z t and Y t is to incorporate the asset- 
volatility codependence 31 in our framework. We note that in a discrete-time setting, both 



Britten- Jones & Neuberger (2000) and Rossi (2002) introduce different transition matrices for 



the F-states for different transitions between the Z-states in an ad hoc way. Our approach, 



which starts with a continuous time dynamics and the decomposition Eq.(63) of the Markov 



generator, is hopefully a bit more systematic and easier to relate to one's intuition. 

As the conditional Markov chain generator Aff is fixed by the above procedure of cal- 

the joint jump generator is specified by defining the pseudo- 



ibration to vanilla options, 

generator Q^g(t) (see Eq.(60)). Much as we did in our approach above for the idiosyncratic 
y-generator, here we settlefor a very simple and parsimonious choice. Alternative and 
more complicated specifications of the pseudo-generator (t) could clearly be considered 
instead without significantly affecting complexity or performance of the model. 



31 



Such as the well-known leverage effect (a negative spot-volatility correlation) for equity markets. 
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Specifically, we assume that conditionally on yAZ^\ AZj® j , jumps AY t {1) and AY t {2) 

are independent. We further specify that jump AY^ depends only on AZ^ rather than on 

the pair (AZ^\ AZf^j, and similarly AY^ depends only on Azf\ To control the amount 

of the asset- volatility codependence in our model, we introduce two parameters, 71 and 72, 

that determine the codependence for the joint moves {AZ^\AY^\ and ^Azj: 2 \ AY} 2 ^ 

respectively, such that 7$ > and 7i < correspond to positive and negative codependences, 
respectively We then define the pseudo-generator Q^s(t) as follows 
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<Wi + (7i m i 
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1 A ai/3i 
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(2,-)l 



»4) 



<W 2 + (l2m 2 ) A y a ^l + (7 2 m 2 ) -6^6^, (mi,m 2 ) ^ (0, 0), 



where for any real number 2 we defined x + = max(0, x) and x~ = max(0, — x). Matrices 
and in Eq.(84) stand for some upper- and lower-triangular generator matrices, 
respectively. For example, a nonsparse specification of generator Eq.(84) could be provided 
using two parameters, (71,(72 < 1, that determine the speed of decay of the generator away 
from the diagonal: 



A 



(*,+) 

a/3 
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1,2, 



(85) 



where 0(x) stands for a Heavyside step-function. Alternatively, if we want to keep the 
generator sparse, we could consider bi-diagonal specifications for matrices A^ %,± \ 



7 ITC-USLV(2,2): Implied Time Change Process 

The algorithm of forward induction-based calibration of the USLV model just presented 
is simple and intuitive; however, it is not ideal from a practical viewpoint, as it assumes 
that the time steps are small enough to justify the use of a trinomial (birth-and-death) 
approximation for the diffusion process in Z t . In practice, this means we should take daily 
(possibly hourly) steps, which may slow down calibration and pricing. If we want to be able 
to have a lattice with larger steps (e.g., monthly), or work with irregular large steps, we need 
a different method. 

Several alternatives of different complexity can be considered at this point. One approach 



would be to generalize Eq.(73) to the case of larger time steps At by viewing the left- and 
right-hand sides of Eq.(73) as leading terms in expansions of finite-time matrix exponentials 
of the conditional (on a realization of Y t ) generator of the QBD Markov chain for the 
calibrated and "prior" model, respectively. While it can be shown that the recursive forward 



42 



This parameterization was proposed by Leonid Malyshkin. 
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calibration can be carried over in such framework as in the one-step BJN method described 
above, practical uses of such an approach may be constrained by our ability to compute 
conditional multi-step transition probabilities for Z-states in a numerically efficient way. We 
expect that splitting methods can be efficiently used to this end, but we leave research in 
this direction for a future work, and instead concentrate on alternative approaches. The 
latter are based on stochastic time change techniques, which is what we present next. 



7.1 Modeling Stochastic Time Changes 



Let £ t be a (random) matrix- valued value of the QBD process with generator Eq.(45) at 
time t. Consider a right-continuous nondecreasing process T t with tq = with independent 
and homogeneous increments. 33 In the present context, such process T t is called a Bochner 



subordinator, see, e.g., Feller (1968). 



Now consider a new process rj t = £y t given by our QBD process Eq.(45) evaluated at a 



random (business) time T t instead of the calendar time t. This produces a QBD Markov 
chain subordinated to the Bochner subordinator T t . 

Note that the idea of a subordinated Markov chain has already been used above in 
Sect. 5.1 (see also a discussion on this point in Gross & Miller ( 1984[ )) as a computational 
tool for evaluation of matrix exponentials of generator Eq.(45). A more general subordinator 



is given by a nondecreasing Levy process; see, e.g., Carr et al. (2003) and references therein. 
It can be written as 



Y s ds + T y 



(jump) 



{jump) 



(86) 

stands for a jump 



where Y t is a nonnegative process called the activity rate, and T t 
component of the time change. 

Many specifications of a time change process can be described by the general formula 



Eq.(86). For example, subordination by a Poisson process was used above in Sect. 5.1 which 



corresponds to the Bochner subordinator being a pure jump process with a finite jump 
activity. Another simple choice for a pure jump time change would be a gamma process 
(incidentally, this process has a particularly simple Laplace transform). Alternatively, one 
can consider purely diffusive time changes where T^ ump ^ vanishes, with the activity rate Y t 
specified, e.g., by a CIR process or a positive OU process. 

Let us assume that the stochastic time change is independent of the QBD Markov chain, 
and that its Laplace transform 

£ Tt (u)=E[e~ uT <] (87) 

is known in closed form, or can be computed numerically at a low cost. Consider first a 
one-factor stochastic time change for a single Markov chain with generator A. Recall that 
finite-time transition probabilities can be computed using the randomization method as in 



That is, T t+S — T t is independent of the filtration T% and has the same distribution as r s . 
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Eq. ( 50 ) , which we write here as 

oo 

Pit) = Poe- Xt £ 



(At) n P r 



nl 



n! 



dA/ 



n=0 n=0 

Substituting here t T t , taking the expectation with respect to future scenarios of the time 



change process T t , and interchanging the summation and expectations in Eq.(88), we obtain 



E [P(T t )) = 



n=0 



111 



_ p'pn 



X- 



dX 



C Tt (X). 



19) 



Note that in practice, randomization methods truncate an infinite series in Eq.(88) at some 
finite n max determined by a needed accuracy level e. If we first truncate the series for fixed 
t given e, and then do a stochastic time change, this might lead to a substantial loss of 
accuracy. One possible way to achieve a fixed-e calculation for a given t in the model with 
stochastic time T t is as follows. We first find a maximum value of T max = max T T t that can 
be reached at some confidence level, and then truncate the sum at some value n' nrr . that 



provides needed accuracy for Eq.(88) where t is replaced by r, 



This means that if derivatives of the Laplace transform of the time change are easy 



to compute, and the number of terms we need to keep in Eq.(89) for given tolerance is 



reasonably small, then the problem of parametric calibration of parameters of the F-process 
in the USLV(1,1) (or USLV(2,1), see below) models can be solved, in the zero-correlation 
limit, in one step, with no need for a forward induction algorithm. 

Note that while the assumption of zero correlation might be restrictive, the above ap- 
proach can in fact be generalized to the case of nonzero correlation, at the price of introducing 
a complex- valued measure (see Carr & Wu ( 2004[ )). This could be used to calibrate param- 
eters of the Y process for a given set of option quotes while keeping the ID SFs fixed or 
flatj^] Further improvements of calibration quality (if desired) could then be achieved using 
a forward- induction-based calibration method described in Sect. 16.51 

For the most interesting two-factor time change specification (i.e., for USLV(2,2)), the 
situation is more tricky because of correlations between different drivers, as well as because of 
noncommutativity of generators A\ and A2. However, as will be shown in the next section, 
it turns out that a tractable framework can be obtained with a proper construction of a 
two-factor time change. 



7.2 Time Change with a Hierarchical Bivariate Subordinator 

It might be tempting to try to extend our framework by incorporating stochastic time changes 
with a nonvanishing jump component T^ umpS} in Eq.(86), with a two-factor time change 
T t = {T^ 1 \t^). If ,t/ include jumps, this leads to jumps in the underlyings Z t = 



34 Alternatively, we could use the zero-correlation limit as a "quick and dirty" way to estimate the pa- 
rameters of the full model with nonzero correlation, except of course those parameters that are critically 
dependent on the level of correlation. 
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(^zj: 2 \ zj: 2 ^. If both the nonvanishing activity rate Y t and a jump component T^ umpS) are 

retained in Eq.(86), this results (in our setting) in a four-factor stochastic-local volatility 
dynamics with jumps in both the under lyings and volatility. 

Now we introduce such a two-factor stochastic time change with a bivariate subordinator 
and show how to evaluate finite-time transition probabilities in a resulting subordinated 



Markov chain using a generalization of the randomization method presented in Sect. 5.1 



Recall from Sect. 6.2 that the Markov chain generator A in our problem has the decompo- 



sition Eq.(52), where both A\ and A2 are valid generators that can be subject to individual 



stochastic time changes. Consider a bivariate subordinator of the following form 

T t =l^) = (°'J'). (90) 



Here 9 t > is a stochastic process with E,[6 t ] = 1 that will be specified in more detail below. 
We assume that 9 t is independent of T t . As 6 t acts as a time dilation factor on top of the 
random time T t , we will refer to 9t as a time-dilaton process. Note that correlation between 
and is now driven by the variance of 6 t : 



= I Var ( T t) f91) 

/7 " l "'" J ' W Var(T t ) + Var(9 t )(Var(T t ) + (E[T t ]f) 

so that we can fit any nonnegative correlation between Tj and by a proper choice of 
Var(6 t ). 

We assume that (T t , 9 t ) is a 2D Markov process with independent and time-homogeneous 
increments. Furthermore, we assume that both T t and 9t are non-decreasing exponential- 
Levy processes. As a product 9 t T t of two (non-decreasing) exponential-Levy processes T t and 



6 t is another (non-decreasing) exponential-Levy process, Eq.(90) defines a valid subordinator 



that can be used to time-change our QBD process with generator Eq.(52). 



The interpretation of the bivariate subordinator Eq.(90) is as follows. The second compo- 



(2) 

nent T t = T t provides a common time change that modulates all transitions on the chain; 
i.e., it acts on both generators A\ and A 2 . The first component = 9 t T t can be thought 
of as a hierarchical time change. In this hierarchical scheme, we first apply a common time 
change T t , and then time-change it again using a linear time change function T t (7*) = 6 t T t . 
This time change will be applied below to generator A% alone ^ Note that if both T t and 
8t are non-decreasing exponential Levy processes, this implies that the stochastic clock runs 
faster for transitions involving changes of both and Z^ than for transitions that only 
involve changes of Z^ 2 \ Also note that while 9 t is a stochastic process, in the context of 
calculation of finite-time transition probabilities on a fixed interval t e [0, t] (where t is some 



35 Note that the order of the time changes indicated above is very important: if we reversed it, this would 
be equivalent to allowing future events of transitions driven by Ai to impact the dynamics of the whole 
system at the present time. For a recent application of such hierarchical time changes, see, e.g., Puzanova 

poTTl). 
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"interesting" time, e.g., a coupon date) what matters is only a terminal value 9 t (see below 
in Eq.(92)). Therefore, for such calculation we can treat the terminal value 9t as a random 
variable 36 which certainly simplifies an approach presented below. 



The finite-time transition probability for a time-changed QBD Markov chain can now be 
computed by conditioning on the realization of 9 t : 



P(t) 



E 



E [ P y^ t A 1+ T t A^ = jg. [ E [p/ e T t (^ 1+ A 2) ] | 0j 



E [E [P' Q e TtAe \ 9 t ]] . 



(92) 



Here the outside expectation corresponds to averaging with respect to the randomness due 
to 9 t , while the inner expectation averages over the randomness due to T t for a fixed value 
of 9 t . In the last equation, we have defined Ag = 9 t A\ + Ai for any fixed 9% = 9. 

Now consider the inner expectation in Eq.(92). For any fixed 9 t 
follows 



9, we proceed as 
Next, we use the 



First we specify a nonnegative parameter Ag > max n \(Ag) m 
idea of the randomization method of Sect. I5. II to define a DTMC with transition matrix 



1 + 



Ag 



A, = A, (P e - I) . 



(93) 



Substitute A as given by Eq.(93) into the solution of the forward equation, which we write 



here as Pg(t) to emphasize that the whole calculation is done for a fixed 9t = 9: 
Pg(t) = E [Py tAe \ 9 t ] = E [Py* A ^e-I)\ flj = E [ e -A^p. e T t A,P 9 | 

Using a Taylor series expansion for the matrix exponential in this expression and interchang- 
ing the summation and expectation, we obtain 



( p'~pn\ °° 

Pe(t) = £ ^r^ E l(mre- A ^\9 t ] = 



n=0 



n=0 



-l) n f d xn 



, (94) 



A=A fl 



where CtXu) stands for a Laplace transform Eq.(87) of the time change T t . 

Eq.(94) is a "semi-analytical" expression for a finite-time transition probability in our 
Markov chain after the first time change driven by T t , but before the second time change 
driven by 9t- The product P^Pg can be efficiently implemented via a recursive vector-matrix 
multiplication as in Eq.^. The derivatives (A^)™£ Tt (A) can be easily computed if the 
Laplace transform £t ( (A) is known in closed form (or can be computed numerically at a low 
cost). Note that this part of the calculation is independent of any pricing data — the latter 
only impacts the matrix Pg through a calibrated set of ID SFs Sj. 

As an example, consider an exponential-gamma subordinator specification for T t = T t 
with T t = exp(X t ), where X t is a Gamma process. Recall that an univariate (homogeneous) 
Gamma process X t > with X = and parameters a, b G M + is a process with independent 



36 This is due to the Markov property of the # t -dynamics which was assumed above: For a continuous-time 
Markov process, a time line can be chosen in an arbitrary way, while the corresponding finite-time transition 



probabilities would be related by the Chapman-Kolmogorov equations; see, e.g., Feller (19681 
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increments such that X t is Gamma-distributed T(at, b) with the following probability density 
function (pdf): 

Ub(x) = x ^e- bx l M+ (x), (95) 

with 

E[X t ] = j, Var[X t ] = j±. (96) 
Here the parameter a is called the shape parameter, and b is the rate parameter. In what 



follows, we set a = b = 1/u. The Laplace transform of Eq.(95) reads 



C Xt {u) = (1 + (97) 

Given this expression, we can approximately compute the Laplace transform of T t = exp(X t ). 
All rescaled derivatives (A^) n £T t (A) would then have to be computed from that latter 
Laplace transform. 

After the condtional time-t probabilities are computed, the unconditional probabilities 
are obtained by averaging over a marginal probability density pt{9) of 9 t : 

Pit) = E \P e {t)} = J d9p t (9)P e (t). (98) 

In practice, this integral should be computed by discretization of the range of 9 t onto a finite 
grid [0 o ,0i,...,0 ff -i]- 



Note that we can proceed in two different ways with a computation involved in Eq.(98) 



The first way would be to specify a process for 9 t , discretize it, and then compute a discrete 



approximation to Eq.(98). We could tune parameters of this process to fit a given set of 
option quotes. However, a particular parametric model of 9 t might be too restrictive for 
such task, especially if the number of option prices to fit grows larger. For this reason, in 
the next section we present a more flexible nonparametric approach that is able to fit any 
arbitrage-free set of option quotes. 

7.3 Implied Time-Dilaton Process 



For a particular parametric specification of a (discretized) time dilaton process 9t, Eq.(98) 
produces some transitions probabilities for the Z-states in the time-changed QBD Markov 
chain. In general, these transition probabilities would be different from marginal probabilities 
in the local volatility USLV(2,0) calibrated to observed option prices. This means that a 
nearly perfect calibration to options achieved in the USLV(2,0) model would in general be 
lost once we add a stochastic time change to our model. 

However, we can rematch the prices of options in our calibration set after a time change 
if we treat the distribution pt{9) of realizations of different values of 9 t as a distribution 
implied by option prices (given the specification of a process for T t ). Furthermore, for 
a multi-period setting specified by a particular time grid t ,ti, . . . (where can be, e.g., 
swaption maturities in the calibration set), we can construct an implied process for 6 t if 
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we impose a Markovian structure on it. In this section, we show how such process can be 
constructed using a Minimum Cross Entropy (MCE) method (see, e.g., Cover & Thomas 



( 2006[ )). Our construction is similar to Halperin] (2009) where analogous ideas were used in 
a different context. 

Let Fijk be a payoff function of the option with maturity tj and strike Kj (with j = 
1, . . . , K) in a scenario where the terminal value of the underlying at the option maturity 

is given by values (zj^, zj^\ and let CV,- be the corresponding market prices of options in 

our calibration set. Using Eq.(98), we obtain a set of constraint^' 



/ d9pi(0)Gij(0) = C i:j , Gij(0) = Fijk [Pek,k (U 



J 



0. 



(99) 



where k = (fci,/^) is an index corresponding to the initial value Zq. Note that Eq.(99) 
with j = corresponds to the constraint E[6* t ] = 1 implied above in Eq.(91), which is here 
enforced as an additional artificial option quote with j = 0, Fiok = 9t and = 1. 

We can now find a probability density p%{0) that satisfies these constraints using the MCE 
approach. With this method, given a reference ( "prior") model qi{0) (given, e.g., by another 
exponential-gamma process), we minimize the Kullback-Leibler (KL) distance between the 



two distributions Pi(0) and qi{0) (see Cover & Thomas (2006)): 



D\pi 



d9p l {9) log 



Pi{0) 
Qi(0) 



(100) 



subject to constraints of Eq.([99|). This produces a least biased (relatively to the reference 



measure qi{0)) distribution Pi{0) that satisfies the constraints of Eq.(99) 



For the first node on the time grid, minimization of Eq.(100) with constraints Eq.(99) is 
done using the method of Lagrange multipliers. Using Eq. ( 100 ) with i — 1, the corresponding 
Lagrange function is 



d0pi(0)log 



Pi(0) 



(i) 



d0 Pl {0)G lj {0) - Cij 



(101) 



where ^ are Lagrange multipliers. Minimizing this expression with respect to pi(0), we 
obtain 



Pi(0) = y-qi{e)e 



1 E^^GyW 



Z 1 = / d9 qi {9)e^ 



(102) 



The Lagrange multipliers can now be computed by plugging Eq.(102) back into Eq.(lOl) 



and maximizing the resulting expression as a function of {£j }• This amounts to a convex 
optimization problem in dimension equal to the number of option quotes. (For more details 



on the MCE method in both the one- multi-period settings, see, e.g., Halperin (2009) and 
references therein.) 



Note that we use the continuous notation here for simplicity of presentation only. For implementation, 
all stochastic processes arc discrctized within our approach. 
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For the second maturity, instead of minimizing the unconditional KL distance Eq.(lOO), 
we minimize a conditional KL distance for the next interval. This is done as follows. Using 
the Markov property we can write the pricing constraints as 



G, 



2J 



d9 2 p 2 {9 2 )G 



d6 2 G 



2j\V 2 ) 



de lPl (e 1 ) P (9 2 \9 1 ). 



(103) 



Assuming that the density pi(9i) is fixed at the previous step, the conditional transition 
) can be found by mir 

#[p(0 2 |0l)|l#2|0l)] 



density p(9 2 \9i) can be found by minimization of the expected conditional KL cross entropy 38 

P(0 2 |0i) 



dOi pi( 



d9 2P (9 2 \9 1 ) log 



q(9 2 \9 1 



(104) 



subject to pricing constraints Eq.(103). Here q(9 2 \9i) is a prior transition probability. As 
the time change T t should be nondecreasing, it should satisfy the condition q{9 2 \9i) = 
for 9 2 < 9\. Again, a natural choice for the prior transition density could be the transition 
density of an exponential-gamma process. 

The corresponding Lagrange function for the second interval is 



d0 x pi{ 



d9 2 p{9 2 \9 l )\og 



p(9 2 \9 1 



(2) 



q(92\9 1 ) 

d9 2 G 2j (9 2 ) I d9 lPl (9 1 )p(9 2 \9 1 )-C 2tl 



(105) 



(2) 

where Q are Lagrange multipliers enforcing the constraints in Eq.(103). Minimizing this 
expression with respect to p(9 2 \9i), we obtain the conditional transition probability 



P(9 2 \9 1 ) 
Z2{9^ {2) ) 



q(9 2 \9 1 )e^^ G ^ 



Z 2 (9 1} e 2) 

d9 2 q{9 2 \9 l )e 



(106) 



Note that p{9 2 \9i) < if 9 2 < 9\ (i.e., our "true" time-dilation process is a valid subordinator) 
as long as our prior model q{9 2 \9i) is a valid subordinator. 

Substituting Eq.(106) into Eq.( |105 ) (and flipping the sign to convert a maximization 
problem to a minimization problem), we obtain the following function U(^ 2 ^) (sometimes 
referred to as a potential function): 



U(^) = J d9 1 p(9 1 ) logZ 2 (^,e (2) ) - l^ 2) C 2j 



(107) 



(2) 

The problem of computation of the Lagrange multipliers Q is now reduced to minimizing 
Eq.(107), which again amounts to a convex optimization problem in dimension equal to the 
number of option quotes for maturity t 2 . 



The conditional KL cross entropy is a measure of the difference between two conditional transition 
probabilities, averaged over the position of the initial point; see, e.g., Cover & Thomas (2006). 
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For a multi-period setting with more than two nodes on a time line, the above scheme is 
applied recursively. Let ti, ti, ■ ■ ■ , ijv be nodes on the time line. We first solve the problem for 
the pair t\,t% as described above. Using these results, we next calculate marginal probabilities 
/ (62) using the Chapman-Kolmogorov equations. Now the problem for the pair of times £3 
is treated in the exact same manner as above. We then move to the pair t 3 , £4, etc. As a result, 
we end up with an implied discrete- valued process for 6 t on a discrete timeline £1, t 2 , ■ ■ ■ , t N . 
Derivatives pricing with this framework can be done using the standard backward induction 
method. 
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Appendices 



A 2D Markov Generator on a Nonuniform Grid 



In this appendix, we discuss how to construct the 2D Markov chain generator A in Eq.(34) 



using a nonuniform grid in Eq.(35). 



To simplify notation, we introduce hk,i = A^^.i),^ = Azk,(i+i,i), k — 1,2, i — 
0, . . . , pki where p\,p2 are the upper boundary of our discrete grid in the first and second 
dimensions. The central difference approximation of the second derivatives reads 



d 2 V 



dz\ 
d 2 V 



dz\ 



where 



s^Vi-u + slv h3 + a+ v i+1 ,j + o {hD) + o {(hi) 2 )) + o (/n^y) , (ios) 
, + ^y,j + st,v h]+1 + o {hD) + o {(hi) 2 )) + o (//.A;, !) , 

6° 



hk,i(hk,i + h 



KAKi + Kl 



At the boundaries these coefficients are 5 ki = 0, i = 1 and 5 ki — 0, i — pk- 

For the mixed derivative we take noncentral differences to preserve nonnegativity 



d 2 v 



dz 1 dz 2 



l 3 m=j—l 



For p > one has Ry = O^i^h^ -) + O(hi^hl) and 

1 



7i.i' — 



7i 



»>J-1 h h ' n >3 



+1' H,j+1 



7i. 



hi,ih 2 ,j ' 7iJ 



_ 

lij-l 



(h^ + hiMl 



h + h + ' 

1,» 2,j 



7i 7 --i =0, T 



Tij+l) 7ii'+l 



/ll, 
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For p < we find i2y = 0(h 14 h+) + <3(/i+ + Oih^hf^ and 



7ij-i 0; 7i,j 7ij'+D 7ij-+i 
1 



1 



1 



~0 = 

'2.7 II, J — 1 <l,J + l' <l,J + l 



h 2 ,jh{ 



+ ' 7«j 



"7i,j— 1 7i,j'+l5 7jj'+l 



Using this in Eq.(39) and regrouping terms, we obtain 



k,m={-l,0,l} 



where the following notation is used : 
1 



a ij\i+l,j 

a ij\i,j+l 
a ij\ij 



1 



2 : %02 (<7 2 #j + P0-l7?i+l) , 



a ij\i,j-l 



1 



where s 4j = [s(2" t )] y . 

To construct a valid Markov generator, we have to make sure that all off-diagonal elements 
are positive and all rows sum to zero. It is also necessary to obey the following property: if f n 
and f n+1 are the state vectors at time moment n and n+1, and A is the transition matrix (i.e., 
jn+i _ Af nS j^ then to preserve positiveness of /, the matrix A must be diagonally dominant. 
When applied to the above equations, these three conditions give rise to tricky dependencies 
between the grid steps h lti , hf { , h 2i , h 2i , which could be hard to reconcile with the usual 
approach of building a nonuniform grid based on expected values of model parameters. One 
possible approach that escapes the need to deal with exceedingly complicated constraints on 
the grid steps could be to use a nonuniform grid in one direction and a uniform grid in the 
other direction]^ 



This is similar to building space grids as a part of a FD approach to solving 2D PDEs that determine 



the option price under some stochastic volatility models. For more details, see, e.g., Toivanen ( 2010 ). 
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B Random Time Change of a Continuous-Time Markov 
Chain 

Consider a homogeneous Markov chain with a diagonalizable generator A such that 

A = UDU^ 1 , D = diag(di, d 2 ,..., d N ), 

where the eigenvalues {d^} are assumed to be in a descending order. The matrix U consists 
of eigenvectors stored column-wise. For a finite-time transition matrix, we then have 

P{t,T) = Ue^ D U-\ 

Next we make the transition matrix stochastic by introducing the random time change t — > T t 
driven by a nonnegative stochastic process (activity rate) Y t such that 

T t = [ Y s ds. (109) 
Jo 

By viewing T t as a "true" "business" or "trading" time as opposed to the calendar time t, 
the transition matrix becomes stochastic as it now depends explicitly on Y t : 

Px(t,T) = Ue D £ Ysds U- x . (110) 

Consider now a Markov chain obtained by conditioning on a path of Y t . By taking the 
derivative of Eq.(llO) with respect to t and comparing with the Kolmogorov equation 

dP x (t,T) 



di 



-A x (t)P x (t,T), 



we see that the conditional on the realization of the path of Y t , our process is given by an 
inhomogeneous Markov chain with generator 

A x (t) = Y t UDU- 1 = Y t A. 
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